کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
761555 1462691 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A fine-grained block ILU scheme on regular structures for GPGPUs
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مکانیک محاسباتی
پیش نمایش صفحه اول مقاله
A fine-grained block ILU scheme on regular structures for GPGPUs
چکیده انگلیسی


• A fine-grained block ILU (FGBILIU) is been implemented using OpenACC and CUDA.
• A fully vectorized inversion algorithm using Gauss–Jordan elimination is developed.
• FGBILU remains mathematically identical to sequential BILU.
• FGBILU provides excellent speedup over sequential BILU on CPU.
• FGBILU has been fully incorporated and validated in a legacy CFD solver INCOMP3D.

Iterative methods based on block incomplete LU (BILU) factorization are considered highly effective for solving large-scale block-sparse linear systems resulting from coupled PDE systems with n   equations. However, efforts on porting implicit PDE solvers to massively parallel shared-memory heterogeneous architectures, such as general-purpose graphics processing units (GPGPUs), have largely avoided BILU, leaving their enormous performance potential unfulfilled in many applications where the use of implicit schemes and BILU-type preconditioners/solvers is highly preferred. Indeed, strong inherent data dependency and high memory bandwidth demanded by block matrix operations render naive adoptions of existing sequential BILU algorithms extremely inefficient on GPGPUs. In this study, we present a fine-grained BILU (FGBILU) scheme which is particularly effective on GPGPUs. A straightforward one-sweep wavefront ordering is employed to resolve data dependency. Granularity is substantially refined as block matrix operations are carried out in a true element-wise approach. Particularly, the inversion of diagonal blocks, a well-known bottleneck, is accomplished by a parallel in-place Gauss–Jordan elimination. As a result, FGBILU is able to offer low-overhead concurrent computation at O(n2N2)O(n2N2) scale on a 3D PDE domain with a linear scale of N. FGBILU has been implemented with both OpenACC and CUDA and tested as a block-sparse linear solver on a structured 3D grid. While FGBILU remains mathematically identical to sequential global BILU, numerical experiments confirm its exceptional performance on an Nvidia GPGPU.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Fluids - Volume 119, 22 September 2015, Pages 149–161
نویسندگان
, , , ,