کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
8947424 864209 2018 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی تئوریک و عملی
پیش نمایش صفحه اول مقاله
Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs
چکیده انگلیسی
The cost of the iterative solution of a sparse matrix-vector system against multiple vectors is a common challenge within scientific computing. A tremendous number of algorithmic advances, such as eigenvector deflation and domain-specific multi-grid algorithms, have been ubiquitously beneficial in reducing this cost. However, they do not address the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. Practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. We present an implementation of the block Conjugate Gradient algorithm on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. As a representative case, we consider the domain of lattice quantum chromodynamics and present results for one of the fermion discretizations. Using the QUDA library as a framework, we demonstrate a 5× speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Physics Communications - Volume 233, December 2018, Pages 29-40
نویسندگان
, , , , ,