کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
492173 721145 2015 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Time and energy modeling of high–performance Level-3 BLAS on x86 architectures
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Time and energy modeling of high–performance Level-3 BLAS on x86 architectures
چکیده انگلیسی

We present accurate piece-wise models for the time and energy costs of high performance implementations of both the matrix multiplication (gemm) and the triangular system solve with multiple right-hand sides (trsm) on x86 architectures. Our methodology decouples the costs due to the floating-point arithmetic/data movement occurring in the higher levels of the cache hierarchy from those of packing/data transfers between the main memory and the L2/L3 cache. A careful analytical study of the data transfers, in combination with an architecture-specific calibration of the costs per operation, render then the components to assemble piece-wise models for the accurate estimation of gemm and trsm’s performance on x86 processors.Our experimental results on an Intel Xeon E5-2620 processor confirm the accuracy of this approach, which reports relative errors for different shapes of gemm and trsm that are, respectively, around 1.5% and 4.5% on average for both time and energy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Simulation Modelling Practice and Theory - Volume 55, June 2015, Pages 77–94
نویسندگان
, , , , , ,