کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523763 868488 2016 24 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Reducing latency cost in 2D sparse matrix partitioning models
ترجمه فارسی عنوان
کاهش زمان تأخیر در مدل های پارتیشن بندی ماتریس پراکنده 2D
کلمات کلیدی
حل کننده تکراری موازی؛ سیستم های خطی نامتقارن؛ ضریب برداری ماتریس انعطاف پذیر؛ پارتیشن بندی ماتریس انعطاف پذیر؛ سربار زمان تاخیر؛ سربار پهنای باند
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Two new sparse matrix partitioning methods for nonsymmetric iterative solvers.
• The models address multiple communication metrics critical for scalability.
• Models realized on two iterative solvers CGNE and CGNR via PETSc toolkit.
• An extensive practical comparison of eight partitioning models.
• Scaled two iterative solvers up to several thousands of processors.

Sparse matrix partitioning is a common technique used for improving performance of parallel linear iterative solvers. Compared to solvers used for symmetric linear systems, solvers for nonsymmetric systems offer more potential for addressing different multiple communication metrics due to the flexibility of adopting different partitions on the input and output vectors of sparse matrix-vector multiplication operations. In this regard, there exist works based on one-dimensional (1D) and two-dimensional (2D) fine-grain partitioning models that effectively address both bandwidth and latency costs in nonsymmetric solvers. In this work, we propose two new models based on 2D checkerboard and jagged partitioning. These models aim at minimizing total message count while maintaining a balance on communication volume loads of processors; hence, they address both bandwidth and latency costs. We evaluate all partitioning models on two nonsymmetric system solvers implemented using the widely adopted PETSc toolkit and conduct extensive experiments using these solvers on a modern system (a BlueGene/Q machine) successfully scaling them up to 8K processors. Along with the proposed models, we put practical aspects of eight evaluated models (two 1D- and six 2D-based) under thorough analysis. To the best of our knowledge, this is the first work that analyzes practical performance of 2D models on this scale. Among evaluated models, the models that rely on 2D jagged partitioning obtain the most promising results by striking a balance between minimizing bandwidth and latency costs.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 57, September 2016, Pages 1–24
نویسندگان
, ,