کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
486674 | 703390 | 2012 | 10 صفحه PDF | دانلود رایگان |
We have implemented a fast collisionless N-body code which runs on GPU, the peak performance of the code reaches 767 GFLOPS (corresponds to 74% of theoretical peak performance for our measurement environment) under an assumption of computational cost is 26 floating-point operations per interaction. Our implementation is 1.7 times faster than CUDA SDK in maximum case (for low N region) due to our proposal algorithm of force accumulation without synchronization. Detailed performance analysis clarifies that the performance metrics of collisionless N-body simulations on GPU are only two quantities: first one is the number of running streaming multiprocessors and another is the clock cycle ratio of latency to access global memory and operations to calculate gravitational interaction.
Journal: Procedia Computer Science - Volume 9, 2012, Pages 96-105