Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
8947424	864209	2018	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

GPU - واحد پردازش گرافیکی

موضوعات مرتبط

مهندسی و علوم پایه شیمی شیمی تئوریک و عملی

پیش نمایش صفحه اول مقاله

Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs

چکیده انگلیسی

The cost of the iterative solution of a sparse matrix-vector system against multiple vectors is a common challenge within scientific computing. A tremendous number of algorithmic advances, such as eigenvector deflation and domain-specific multi-grid algorithms, have been ubiquitously beneficial in reducing this cost. However, they do not address the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. Practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. We present an implementation of the block Conjugate Gradient algorithm on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. As a representative case, we consider the domain of lattice quantum chromodynamics and present results for one of the fermion discretizations. Using the QUDA library as a framework, we demonstrate a 5Ã speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Physics Communications - Volume 233, December 2018, Pages 29-40

نویسندگان

M.A. Clark, Alexei Strelchenko, Alejandro Vaquero, Mathias Wagner, Evan Weinberg,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs

دسترسی سریع

ارتباط

English Website