کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432368 688869 2013 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Sparse matrix–vector multiplication on the Single-Chip Cloud Computer many-core processor
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Sparse matrix–vector multiplication on the Single-Chip Cloud Computer many-core processor
چکیده انگلیسی


• Evaluation of the performance and power efficiency of the SpMV on the SCC many-core.
• Some of the most successful SpMV optimization techniques have been analyzed.
• SCC is very sensitive to locality improvements due to its memory hierarchy.
• Architectural comparison with several leading multi/many-core processors and GPUs.
• Best performance results are obtained by high-end GPUs and the Phi coprocessor.

The microprocessor industry has responded to memory, power and ILP walls by turning to many-core processors, increasing parallelism as the primary method to improve processor performance. These processors are expected to consist of tens or even hundreds of cores. One of these future processors is the 48-core experimental processor Single-Chip Cloud Computer (SCC). The SCC was created by Intel Labs as a platform for many-core software research.In this work we study the behavior of an important irregular application such as the Sparse Matrix–Vector multiplication (SpMV) on the SCC processor in terms of performance and power efficiency. In addition, some of the most successful optimization techniques for this kernel are evaluated. In particular, reordering, blocking and data compression techniques have been considered. Our experiments give some key insights that can serve as guidelines for the understanding and optimization of the SpMV kernel on this architecture.Furthermore, an architectural comparison of the SCC processor with several leading multicore processors and GPUs is performed, including the new Intel Xeon Phi coprocessor. The SCC only outperforms the Itanium2 multicore processor. Best performance results are observed for the high-end GPUs and the Phi, while reaching low values with respect to their peak performance. In terms of power efficiency, we must highlight the good behavior of the ATI GPUs.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 12, December 2013, Pages 1539–1550
نویسندگان
, ,