Towards an ultra efficient kinetic scheme. Part III: High-performance-computing

Article ID	Journal	Published Year	Pages	File Type
6931671	Journal of Computational Physics	2015	18 Pages	PDF

Abstract

In this paper we demonstrate the capability of the fast semi-Lagrangian scheme developed in [20] and [21] to deal with parallel architectures. First, we will present the behaviors of such scheme on a classical architecture using OpenMP and then on GPU (Graphics Processing Unit) architecture using CUDA. The goal is to prove that this new scheme is well adapted to these types of parallelizations, and, moreover that the gain in CPU time is substantial on nowadays affordable computers. We first present the sequential version of our high-order kinetic scheme and focus on important details for an effective parallel implementation. Then, we introduce the specific treatments and algorithms which have been developed for an OpenMP and CUDA parallelizations. Numerical tests are shown for the full 3D/3D simulations. These assess the important speed-up factor of the method gained between the sequential code and the parallel versions and its very good scalability which makes this approach a real competitor with respect to existing schemes for the solution of multidimensional kinetic models.

Keywords

Semi-Lagrangian schemes Parallel computation Discrete velocity models Kinetic equations GPU CUDA