Article ID Journal Published Year Pages File Type
4963646 Astronomy and Computing 2017 9 Pages PDF
Abstract
We discuss a memory efficient construction of an oct-tree structure to sample the mass densities with locally adaptive resolution, according to the features of the underlying tetrahedral tessellation. We propose an efficient GPU implementation for the computationally intensive operation of intersecting the tetrahedra with the cubical cells of the deposit grid, that achieves a speedup of almost an order of magnitude compared to an optimized CPU version. We discuss two dynamic load balancing schemes - the first exchanges particle data between cluster nodes and deposits the tetrahedra for each block of the grid structure on single nodes, whereas the second approach uses global reduction operations to obtain the total masses. We demonstrate the scalability of our algorithms with up to 256 GPUs and TB-sized simulation snapshots, resulting in tessellations with more than 400 billion tetrahedra.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
,