A fine-grained block ILU scheme on regular structures for GPGPUs

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
761555	1462691	2015	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

GPGPU OpenACC CUDA - کودا. پردازش موازی و مدل برنامه‌نویسی

موضوعات مرتبط

مهندسی و علوم پایه سایر رشته های مهندسی مکانیک محاسباتی

پیش نمایش صفحه اول مقاله

A fine-grained block ILU scheme on regular structures for GPGPUs

چکیده انگلیسی

• A fine-grained block ILU (FGBILIU) is been implemented using OpenACC and CUDA.
• A fully vectorized inversion algorithm using Gauss–Jordan elimination is developed.
• FGBILU remains mathematically identical to sequential BILU.
• FGBILU provides excellent speedup over sequential BILU on CPU.
• FGBILU has been fully incorporated and validated in a legacy CFD solver INCOMP3D.

Iterative methods based on block incomplete LU (BILU) factorization are considered highly effective for solving large-scale block-sparse linear systems resulting from coupled PDE systems with n equations. However, efforts on porting implicit PDE solvers to massively parallel shared-memory heterogeneous architectures, such as general-purpose graphics processing units (GPGPUs), have largely avoided BILU, leaving their enormous performance potential unfulfilled in many applications where the use of implicit schemes and BILU-type preconditioners/solvers is highly preferred. Indeed, strong inherent data dependency and high memory bandwidth demanded by block matrix operations render naive adoptions of existing sequential BILU algorithms extremely inefficient on GPGPUs. In this study, we present a fine-grained BILU (FGBILU) scheme which is particularly effective on GPGPUs. A straightforward one-sweep wavefront ordering is employed to resolve data dependency. Granularity is substantially refined as block matrix operations are carried out in a true element-wise approach. Particularly, the inversion of diagonal blocks, a well-known bottleneck, is accomplished by a parallel in-place Gauss–Jordan elimination. As a result, FGBILU is able to offer low-overhead concurrent computation at O(n2N2)O(n2N2) scale on a 3D PDE domain with a linear scale of N. FGBILU has been implemented with both OpenACC and CUDA and tested as a block-sparse linear solver on a structured 3D grid. While FGBILU remains mathematically identical to sequential global BILU, numerical experiments confirm its exceptional performance on an Nvidia GPGPU.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Fluids - Volume 119, 22 September 2015, Pages 149–161

نویسندگان

Lixiang Luo, Jack R. Edwards, Hong Luo, Frank Mueller,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A fine-grained block ILU scheme on regular structures for GPGPUs

دسترسی سریع

ارتباط

English Website