Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6933871 | Journal of Computational Physics | 2013 | 17 Pages |
Abstract
Direct numerical simulations of turbulence are optimized for up to 192 graphics processors. The results from two large GPU clusters are compared to the performance of corresponding CPU clusters. A number of important algorithm changes are necessary to access the full computational power of graphics processors and these adaptations are discussed. It is shown that the handling of subdomain communication becomes even more critical when using GPU based supercomputers. The potential for overlap of MPI communication with GPU computation is analyzed and then optimized. Detailed timings reveal that the internal calculations are now so efficient that the operations related to MPI communication are the primary scaling bottleneck at all but the very largest problem sizes that can fit on the hardware. This work gives a glimpse of the CFD performance issues will dominate many hardware platform in the near future.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Ali Khajeh-Saeed, J. Blair Perot,