Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
432418	688884	2013	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Pipelining Performance evaluation - ارزیابی یا سنجش عملکرد Vector processing - پردازش برداری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

چکیده انگلیسی

• We propose a low-complexity vector-core called LcVc.
• LcVc has common execution datapath for executing scalar/vector instructions.
• LcVc is implemented using VHDL targeting the Xilinx FPGA Spartan 3E.
• We evaluate the performance of LcVc on vector/matrix kernels.
• Insignificant hardware (5%) is required to support the enhanced vector capability.

This paper proposes a low-complexity vector-core called LcVc for executing both scalar and vector instructions on the same execution datapath. A unified register file in the decode stage is used for storing both scalar operands and vector elements. The execution stage accepts a new set of operands each cycle and produces a new result. Rather than issuing a vector instruction (1-D operations) as a whole, each vector operation is issued sequentially with the existing scalar issue hardware. In the first implementation of LcVc, all loads and stores of registers take place from the data cache in the memory access stage in a rate of one element per clock cycle. The complete design of our proposed LcVc processor is implemented using VHDL targeting the Xilinx FPGA Spartan 3E, xc3s1600e-4-fg320 device. The total number of slices required for implementing LcVc is 1778, where the number of slice flip-flops is 538 and the number of 4-input LUTs is 3706: 1914 for logic and 1792 for RAMs. Moreover, our performance evaluation results show that the speedup of executing vector addition, vector scaling, SAXPY, and matrix–matrix multiplication on LcVc over the scalar execution are 2.3, 2.5, 1.9, and 3, respectively. The hardware required to support the enhanced vector capability is insignificant (5%), which results in reducing the area per core and increasing the number of cores available in a given chip area.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 6, June 2013, Pages 836–850

نویسندگان

Mostafa I. Soliman,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

دسترسی سریع

ارتباط

English Website