Article ID Journal Published Year Pages File Type
486674 Procedia Computer Science 2012 10 Pages PDF
Abstract

We have implemented a fast collisionless N-body code which runs on GPU, the peak performance of the code reaches 767 GFLOPS (corresponds to 74% of theoretical peak performance for our measurement environment) under an assumption of computational cost is 26 floating-point operations per interaction. Our implementation is 1.7 times faster than CUDA SDK in maximum case (for low N region) due to our proposal algorithm of force accumulation without synchronization. Detailed performance analysis clarifies that the performance metrics of collisionless N-body simulations on GPU are only two quantities: first one is the number of running streaming multiprocessors and another is the clock cycle ratio of latency to access global memory and operations to calculate gravitational interaction.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)