Cholesky factorization on SIMD multi-core architectures

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4956225	1444442	2017	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

Cholesky factorization on SIMD multi-core architectures

چکیده انگلیسی

Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speed up the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current multi-core and many-core SIMD architectures (SSE, AVX2, KNC, AVX512, Neon, Altivec). We focus on the fact that, on some architectures, compilers are unable to vectorize and on other architectures, vectorizing compilers are not efficient. Thus hand-made SIMDization is mandatory. We achieve with these transformations combined with SIMD a speedup from Ã 14 to Ã 28 for the whole resolution in single precision compared to the naive code on a AVX2 machine and a speedup from Ã 6 to Ã 14 on double precision, both with a strong scalability.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems Architecture - Volume 79, September 2017, Pages 1-15

نویسندگان

Florian Lemaitre, Benjamin Couturier, Lionel Lacassagne,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Cholesky factorization on SIMD multi-core architectures

دسترسی سریع

ارتباط

English Website