OpenCL Implementation of Basic Operations for a High-order Finite-volume Polynomial Scheme on Unstructured Hybrid Meshes

Article ID	Journal	Published Year	Pages	File Type
859909	Procedia Engineering	2013	5 Pages	PDF

Abstract

A parallel finite-volume algorithm based on a cell-centered high-order polynomial scheme for unstructured hybrid meshes is under consideration. The work is focused on the adaptation and optimization of basic operations of the algorithm to different architec- tures of massively-parallel accelerators including GPU of AMD and NVIDIA. Such an algorithm is especially problematic for the GPU architectures since it has very low FLOP per byte ratio meaning that performance is dominated by the memory bandwidth but not the computing performance of a device. At the same time it has irregular memory access pattern since unstructured meshes are used. The calculation of polynomial coefficients and the calculation of convective fluxes through faces of cells are the most interesting and time consuming operations of the algorithm. Implementations of these operations for accelerators using OpenCL are considered here in detail. The ways to improve the computational efficiency are proposed, performance measurement results reaching up to 160 GFLOPS on a single GPU device are demonstrated.