دانلود رایگان مقاله: یک استراتژی مسدود کردن محور برای مدل بارگذاری ضریب ماتریس متعادلی بر روی پردازنده های گرافیکی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
432304	688855	2015	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

A model-driven blocking strategy for load balanced sparse matrix–vector multiplication on GPUs

ترجمه فارسی عنوان

یک استراتژی مسدود کردن محور برای مدل بارگذاری ضریب ماتریس متعادلی بر روی پردازنده های گرافیکی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

SpMV GPU - واحد پردازش گرافیکی CUDA - کودا. پردازش موازی و مدل برنامه‌نویسی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش مقاله

یک استراتژی مسدود کردن محور برای مدل بارگذاری ضریب ماتریس متعادلی بر روی پردازنده های گرافیکی

چکیده انگلیسی

• A novel blocking strategy that reduces thread divergence and improves load balance.
• Enhanced performance modeling for selection of a key blocking parameter.
• An efficient auto-tuning technique to optimize performance.
• Comprehensive experimental evaluation and integrating with a real system; PETSc.
• A multi-GPU algorithm for SpMV with experimental evaluation.

Sparse Matrix–Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs.In this paper we present a new Blocked Row–Column (BRC) storage format with a two-dimensional blocking mechanism that addresses these challenges effectively. It reduces thread divergence by reordering and blocking rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps). BRC improves load balance by partitioning rows into blocks with a constant number of non-zeros such that different warps perform the same amount of work. We also present an approach to optimize BRC performance by judicious selection of block size based on sparsity characteristics of the matrix.A CUDA implementation of BRC outperforms NVIDIA CUSP and cuSPARSE libraries and other state-of-the-art SpMV formats on a range of unstructured sparse matrices from multiple application domains. The BRC format has been integrated with PETSc, enabling its use in PETSc’s solvers. Furthermore, when partitioning the input matrix, BRC achieves near linear speedup on multiple GPUs.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 76, February 2015, Pages 3–15

نویسندگان

Arash Ashari, Naser Sedaghati, John Eisenlohr, P. Sadayappan,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک استراتژی مسدود کردن محور برای مدل بارگذاری ضریب ماتریس متعادلی بر روی پردازنده های گرافیکی

دسترسی سریع

ارتباط

English Website