Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
523768	868488	2016	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Large-scale computing Scalable algorithms - الگوریتم های مقیاس پذیر Parallel computing - رایانش موازی، محاسبات موازی Graphics processing units - واحد پردازش گرافیکی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

چکیده انگلیسی

• We present a method for parallel block-sparse matrix-matrix multiplication.
• A distributed quadtree matrix representation allows exploitation of data locality.
• The quadtree structure is implemented using the Chunks and Tasks programming model.
• Data locality is exploited without prior information about matrix sparsity pattern.
• Constant communication per node on average is achieved in weak scaling tests.

We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation, data locality is exploited without prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined with the Chunks and Tasks model leads to favorable weak and strong scaling of the communication cost with the number of processes, as shown both theoretically and in numerical experiments.Matrices are represented by sparse quadtrees of chunk objects. The leaves in the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by the matrix library and may occur at any level in the hierarchy and/or within the submatrix leaves. In case graphics processing units (GPUs) are available, both CPUs and GPUs are used for leaf-level multiplication work, thus making use of the full computing capacity of each node.The performance is evaluated for matrices with different sparsity structures, including examples from electronic structure calculations. Compared to methods that do not exploit data locality, our locality-aware approach reduces communication significantly, achieving essentially constant communication per node in weak scaling tests.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 57, September 2016, Pages 87–106

نویسندگان

Emanuel H. Rubensson, Elias Rudberg,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

دسترسی سریع

ارتباط

English Website