Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4956875	1364714	2016	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

چکیده انگلیسی

Intel's Advanced Vector Extensions is known as single instruction multiple data streams (SIMD), and the instruction sets is introduced in the second-generation Intel Core processor family. This new technology is supported by new generations of Intel and AMD processors. The advanced vector extensions (AVX) exploits single instruction multiple data (SIMD) computing units for fine grained-parallelism. These instructions process multiple data elements simultaneously and independently. Many applications such as signal processing, recognition, visual processing, scientific and engineering numerical, physics and other areas of applications need for vector floating point performance supported by AVX. Matrix-Matrix multiplications is the core of many important algorithms such as signal processing, scientific and engineering numerical, so it is substantial to accelerate implementation of matrix-matrix multiplications. It is very important to use appropriate compilers that can optimally utilize the new features of the evolving processors. For this purpose, a clear vision on the performance of the compilers on performance characteristics of AVX is needed. In addition choosing the appropriate programming method is substantial to gain the best performance. In this paper, the performance evaluation of matrix-matrix multiplications in three forms (CÂ =Â A. B, CÂ =Â A. BT, and CÂ =Â AT. B), using Intel's advanced vector extension (AVX) instruction sets has been reported. The obtained results are compared using inline assembly versus intrinsic functions for programming. A comparative study to indicate the effects of two widely used C++ compilers: Intel C++ compiler (ICC) in Intel Parallel Studio XE 2016 against Microsoft Visual Studio C++ compiler 2015 (MSVC++) has been investigated. The results are evaluated on Intel Core i7 processor on a Broadwell system for square matrices of different large sizes. The results demonstrate that the Intel compiler has better performance than MSVC++ compiler by 1.34, 1.32, and 1.22 using inline assembly language and by 1.36, 1.19, and 1.25 using intrinsic functions for CÂ =Â A. B, CÂ =Â A. BT, and CÂ =Â AT. B, respectively. The performance of using intrinsic functions compared to the inline assembly demonstrates that the intrinsic functions has better performance than inline assembly by 2.1, 2.13, and 2.18 using Intel compiler and by 2.08, 2.49, and 2.11 using MSVC++ compiler for CÂ =Â A. B, CÂ =Â A. BT, and CÂ =Â AT. B, respectively.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Microprocessors and Microsystems - Volume 47, Part B, November 2016, Pages 369-374

نویسندگان

Somaia Awad Hassan, A.M. Hemeida, Mountasser M.M. Mahmoud,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Performance Evaluation of Matrix-Matrix Multiplications Using Intel's Advanced Vector Extensions (AVX)

دسترسی سریع

ارتباط

English Website