
The emergence of exascale computing, driven largely by GPU acceleration, has transformed high-performance computing (HPC). Declarative languages like Datalog naturally benefit from this evolution, using simple recursive rules compiled efficiently into GPU-optimized relational algebra operations. Unlike SQL, Datalog iteratively executes queries until a fixed-point is reached, ideal for graph mining, deductive databases, and program analysis. Existing engines such as SLOG, LogicBlox, and Soufflé target multi-core architectures and lack support for multi-node, multi-GPU environments. Our research addresses this gap by developing the first multi-GPU, multi-node Datalog engine, combining CUDA for intra-node parallelism with MPI for inter-node communication. We introduce GPU-parallel implementations of relational joins, scalable recursive aggregation, and novel iterative all-to-all communication strategies. Evaluations on Argonne’s Polaris supercomputer achieved speedups up to 32× over state-of-the-art distributed Datalog engines, highlighting potential expansions into domains such as topological data analysis and visual analytics establishing a foundation for declarative analytics on future HPC platforms.