Article ID Journal Published Year Pages File Type
761942 Computers & Fluids 2014 16 Pages PDF
Abstract

•Discretizations for non-conservative terms with a simple HLLC scheme are derived.•TVD Runge–Kutta method is implemented with properly sequenced operator splitting.•A special procedure for block synchronization without exiting a kernel is proposed.

In this paper, the application of an HLLC-type approximate Riemann solver in conjunction with the third-order TVD Runge–Kutta method to the seven-equation compressible two-phase model on multiple Graphics Processing Units (GPUs) is presented. Based on the idea proposed by Abgrall et al. that “a multiphase flow, uniform in pressure and velocity at t=0t=0, will remain uniform on the same variables during time evolution”, discretization schemes for the non-conservative terms and for the volume fraction evolution equation are derived in accordance with the HLLC solver used for the conservative terms. To attain high temporal accuracy, the third-order TVD Runge–Kutta method is implemented in conjunction with operator splitting technique, in which the sequence of operators is recorded in order to compute free surface problems robustly. For large scale simulations, the numerical method is implemented using MPI/Pthread-CUDA parallelization paradigm for multiple GPUs. Domain decomposition method is used to distribute data to different GPUs, parallel computation inside a GPU is accomplished using CUDA, and communication between GPUs is performed via MPI or Pthread. Efficient data structure and GPU memory usage are employed to maintain high memory bandwidth of the device, while a special procedure is designed to synchronize thread blocks so as to reduce frequencies of kernel launching. Numerical tests against several one- and two-dimensional compressible two-phase flow problems with high density and high pressure ratios demonstrate that the present method is accurate and robust. The timing tests show that the overall speedup of one NVIDIA Tesla C2075 GPU is 31×× compared with one Intel Xeon Westmere 5675 CPU core, and nearly 70% parallel efficiency can be obtained when using 8 GPUs.

Related Topics
Physical Sciences and Engineering Engineering Computational Mechanics
Authors
, , ,