Article ID Journal Published Year Pages File Type
859909 Procedia Engineering 2013 5 Pages PDF
Abstract

A parallel finite-volume algorithm based on a cell-centered high-order polynomial scheme for unstructured hybrid meshes is under consideration. The work is focused on the adaptation and optimization of basic operations of the algorithm to different architec- tures of massively-parallel accelerators including GPU of AMD and NVIDIA. Such an algorithm is especially problematic for the GPU architectures since it has very low FLOP per byte ratio meaning that performance is dominated by the memory bandwidth but not the computing performance of a device. At the same time it has irregular memory access pattern since unstructured meshes are used. The calculation of polynomial coefficients and the calculation of convective fluxes through faces of cells are the most interesting and time consuming operations of the algorithm. Implementations of these operations for accelerators using OpenCL are considered here in detail. The ways to improve the computational efficiency are proposed, performance measurement results reaching up to 160 GFLOPS on a single GPU device are demonstrated.

Related Topics
Physical Sciences and Engineering Engineering Engineering (General)