کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
761873 1462718 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An efficient GPU implementation of cyclic reduction solver for high-order compressible viscous flow simulations
ترجمه فارسی عنوان
یک پردازنده گرافیکی کارآمد از حل کننده کاهش چرخه برای شبیه سازی جریان چسبندگی بالا با درجه بالا
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مکانیک محاسباتی
چکیده انگلیسی


• A global-memory-based Cyclic Reduction (CR) algorithm is implemented on GPU.
• The proposed sort algorithm for memory transactions is well fitted to GPU.
• CR solver is applied to 2D & 3D compressible viscous flows with compact scheme.
• The GPU solver provides speedups up to 15.2× in 2D and 20.3× in 3D simulations.

In this paper, the performance of the Cyclic Reduction (CR) algorithm for solving tridiagonal systems is improved with the aid of efficient global memory transactions on Graphics Processing Units (GPU). To achieve maximum memory throughput with a lower computational runtime, two different Sort algorithms are introduced for reordering the initial system of equations: direct and step-by-step. It is shown that the latter method is well-fitted to modern GPUs and achieves speedup of up to 3.47× in single precision and 2.1× in double precision compared to the CPU Thomas algorithm. By benefiting from the new global memory implementation, the CR solver could run 2×–100× faster compared to previous works on parallel tridiagonal solvers. The CR solver is also applied to 2D & 3D compressible viscous flow simulations using the high-order compact finite-difference scheme. In this matter, the procedure of filtering, primitive variables, and flux derivative calculations are carried out by using the parallel tridiagonal solver on the GPU device. The GPU-accelerated calculations achieve speedups between 1.9×–15.2× in 2D and 6.4×–20.3× in 3D simulations for different grid sizes compared to CPU computations. The computations are performed on the NVIDIA GTX480 GPU. The obtained results are compared to those achieved on a single core of Intel Core 2 Duo (2.7 GHz, 2 MB cache) in terms of calculation runtime.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Fluids - Volume 92, 20 March 2014, Pages 160–171
نویسندگان
, , , ,