کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
488164 703692 2011 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems
چکیده انگلیسی

This paper presents results of our study on double-precision general matrix-matrix multiplication (DGEMM) for GPU-equipped systems. We applied further optimization to utilize the DGEMM stream kernel previously implemented for a Cypress GPU from AMD. We have examined the effects of different memory access patterns to the performance of the DGEMM kernel by changing its layout function. The experimental results show that the GEMM kernel with X-Morton layout function superiors to the one with any other functions in terms of performance and cache hit rate. Moreover, we have implemented a DGEMM routine for large matrices, where all data cannot be allocated in a GPU memory. Our DGEMM performance achieves up to 472 GFlop/s and 921 GFlop/s on a system, using one GPU and two GPUs, respectively.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 4, 2011, Pages 342-351