دانلود رایگان مقاله: معماری پردازنده بردار مدولار که هدف آن در همگرایی سطح داده است

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
462647	696882	2015	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Modular vector processor architecture targeting at data-level parallelism

ترجمه فارسی عنوان

معماری پردازنده بردار مدولار که هدف آن در همگرایی سطح داده است

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

همبستگی پردازنده وکتور، کارایی، سرعت دادن، معیار سنجش

Vector processor Speedup - سرعت دادن Benchmarking - محک زنی، بهکاوی Parallelism - همبستگی Performance - کارایی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش مقاله

معماری پردازنده بردار مدولار که هدف آن در همگرایی سطح داده است

چکیده انگلیسی

Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. We present an innovative architecture for a VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In our lane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. The proposed VP, which is developed in VHDL and prototyped on an FPGA, serves as a coprocessor for one or more scalar cores. Benchmarking shows that our VP can achieve very high performance. For example, it achieves a larger than 1500-fold speedup in the color space converting benchmark compared to running the code on a scalar core. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. Compared to running the benchmark on a VP without the shuffle engines, the speedup is 5.92 and 7.33 for the 64-point FFT without and with compiler optimization, respectively. Compared to runs on the scalar core, the achieved speedups for this benchmark are 52.07 and 110.45 without and with compiler optimization, respectively.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Microprocessors and Microsystems - Volume 39, Issues 4–5, June–July 2015, Pages 237–249

نویسندگان

Seyed A. Rooholamin, Sotirios G. Ziavras,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : معماری پردازنده بردار مدولار که هدف آن در همگرایی سطح داده است

دسترسی سریع

ارتباط

English Website