کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
529785 869708 2014 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Highly parallel GEMV with register blocking method on GPU architecture
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Highly parallel GEMV with register blocking method on GPU architecture
چکیده انگلیسی


• We propose a register blocking method for GEMV on GPU.
• The proposed method can improve the parallelism and reuse data on chip at the same time.
• Different block sizes are tested to found the best block size on a GPU platform.

GPUs can provide powerful computing ability especially for data parallel applications, such as video/image processing applications. However, the complexity of GPU system makes the optimization of even a simple algorithm difficult. Different optimization methods on a GPU often lead to different performances. The matrix–vector multiplication routine for general dense matrices (GEMV) is an important kernel in video/image processing applications. We find that the implementations of GEMV in CUBLAS or MAGMA are not efficient, especially for small or fat matrix. In this paper, we propose a novel register blocking method to optimize GEMV on GPU architecture. This new method has three advantages. First, instead of using only one thread, we use a warp to compute an element of vector y so that the method can exploit the highly parallel GPU architecture. Second, the register blocking method is used to reduce the requirement of off-chip memory bandwidth. At last, the memory access order is elaborately arranged for the threads in one warp so that coalesced memory access is ensured. The proposed optimization methods for GEMV are comprehensively evaluated on different matrix sizes. The performance of the register blocking method with different block sizes is also evaluated in the experiment. Experiment results show that the new method can achieve very high speedup for small square matrices and fat matrices compared to CUBLAS or MAGMA, and can also achieve higher performance for large square matrices.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Visual Communication and Image Representation - Volume 25, Issue 7, October 2014, Pages 1566–1573
نویسندگان
, , , , , , ,