کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432418 688884 2013 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions
چکیده انگلیسی


• We propose a low-complexity vector-core called LcVc.
• LcVc has common execution datapath for executing scalar/vector instructions.
• LcVc is implemented using VHDL targeting the Xilinx FPGA Spartan 3E.
• We evaluate the performance of LcVc on vector/matrix kernels.
• Insignificant hardware (5%) is required to support the enhanced vector capability.

This paper proposes a low-complexity vector-core called LcVc for executing both scalar and vector instructions on the same execution datapath. A unified register file in the decode stage is used for storing both scalar operands and vector elements. The execution stage accepts a new set of operands each cycle and produces a new result. Rather than issuing a vector instruction (1-D operations) as a whole, each vector operation is issued sequentially with the existing scalar issue hardware. In the first implementation of LcVc, all loads and stores of registers take place from the data cache in the memory access stage in a rate of one element per clock cycle. The complete design of our proposed LcVc processor is implemented using VHDL targeting the Xilinx FPGA Spartan 3E, xc3s1600e-4-fg320 device. The total number of slices required for implementing LcVc is 1778, where the number of slice flip-flops is 538 and the number of 4-input LUTs is 3706: 1914 for logic and 1792 for RAMs. Moreover, our performance evaluation results show that the speedup of executing vector addition, vector scaling, SAXPY, and matrix–matrix multiplication on LcVc over the scalar execution are 2.3, 2.5, 1.9, and 3, respectively. The hardware required to support the enhanced vector capability is insignificant (5%), which results in reducing the area per core and increasing the number of cores available in a given chip area.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 6, June 2013, Pages 836–850
نویسندگان
,