Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
486674 | Procedia Computer Science | 2012 | 10 Pages |
We have implemented a fast collisionless N-body code which runs on GPU, the peak performance of the code reaches 767 GFLOPS (corresponds to 74% of theoretical peak performance for our measurement environment) under an assumption of computational cost is 26 floating-point operations per interaction. Our implementation is 1.7 times faster than CUDA SDK in maximum case (for low N region) due to our proposal algorithm of force accumulation without synchronization. Detailed performance analysis clarifies that the performance metrics of collisionless N-body simulations on GPU are only two quantities: first one is the number of running streaming multiprocessors and another is the clock cycle ratio of latency to access global memory and operations to calculate gravitational interaction.