Speeding up cosmological hydrodynamic simulations by exploiting space-filling curve

During my PhD, our numerical group was facing some slow-downs in running large cosmological hydrodynamic simulations with Gadget3 (at the time, private repo).

We used scalasca and profiled the hydrodynamic kernels runtimes of Magneticum simulations. We found that most of the time was not even spent actual physics computations, rather, most of the time was spent in the tree walk neighbour search!

Component Time [s]
Neighbour Search 263 000
Physics computation 155 000
MPI communication 72 000

The Insight

From Ragagnin et al. (2016)

Gadget3 orders its particles using a Hilbert space-filling curve for domain decomposition. The Hilbert curve has a key property: points nearby along the curve are also nearby in 3D space. So particles stored consecutively in memory are physically close, and physically close particles share most of their neighbours.

The fix, which I called Neighbour Recycling: instead of one tree walk per particle, group nearby particles together and perform a single tree walk for the whole group, over a slightly larger radius.

To handle Gadget3's wildly varying densities, the grouping radius is made adaptive:

R = f * smoothing length
In the paper Ragagnin et al. (2016) we find the optimal constant f = 0.5 was found numerically, and corresponds to each tree walk searching for roughly 3x speedup.

Results

Tested on the Magneticum box5/hr, simulation (18 Mpc/h, 2 x 813 particles, 8 MPI processes), The number of neighbour search drops drastically, close to a factor 10. Even if each search is slightly more expensive, the total speedup can still be very large.
From Ragagnin et al. (2016)

Conclusions

The technique doesn't rely on anything Gadget3-specific beyond the space-filling curve ordering, which most modern N-body codes already use. If your code spends more time finding neighbours than computing physics, this idea is worth a look.
The work was carried out at LRZ and LMU Muenchen, with Nikola Tchipev, Michael Bader, Klaus Dolag, and Nicolay Hammer, tested within the Magneticum project. It has been published at ParCo 2016 @ Edinburgh (UK).
Here below, I leave the PDF slides of a internal meeting presentation I gave in 2015.