A shared matrix unit for a chip multi-core processor

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
431839	688638	2013	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Parallel processing - پردازش موازی Multi-core processors - پردازنده های چند هسته ای

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

A shared matrix unit for a chip multi-core processor

چکیده انگلیسی

• We propose extending multi-core processors with a common matrix unit.
• Cycle accurate model is implemented using SystemC to simulate the proposed idea.
• Linear algebra kernels, DCT, SAD, and affine transformation are used to evaluate the performance.
• 9%–26% improvements in the utilization of the shared matrix unit with dual-core.
• Average speedup ranges from 6% to 24% and maximum speedup ranges from 13% to 46%.

This paper proposes extending a multi-core processor with a common matrix unit to maximize on-chip resource utilization and to leverage the advantages of the current multi-core revolution to improve the performance of data-parallel applications. Each core fetches scalar/vector/matrix instructions from its instruction cache. Scalar instructions continue the execution on the scalar datapath; however, vector/matrix instructions are issued by the decode stage to the shared matrix unit through the corresponding FIFO queue. Moreover, scalar results from reduction vector/matrix instructions are sent back from the matrix unit to the scalar core that sent these instructions. Some dense linear algebra kernels (scalar–vector multiplication, scalar times vector plus another, apply Givens rotation, rank-1 update, vector–matrix multiplication, and matrix–matrix multiplication) as well as discrete cosine transform, sum of absolute differences, and affine transformation are used in the performance evaluation. Our results show that the improvement in the utilization of the shared matrix unit with a dual-core ranges from 9% to 26% compared to extending a matrix unit to a single-core. Moreover, the average speedup of the dual-core shared matrix unit over a single-core extended with a matrix unit ranges from 6% to 24% and the maximum speedup ranges from 13% to 46%.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 8, August 2013, Pages 1146–1156

نویسندگان

Mostafa I. Soliman, Abdulmajid F. Al-Junaid,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A shared matrix unit for a chip multi-core processor

دسترسی سریع

ارتباط

English Website