Design space exploration of multi-core RTL via high level synthesis from OpenCL models

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
11032903	1645042	2018	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

OpenCL design space exploration - اکتشاف فضای طراحی Parallel computing - رایانش موازی، محاسبات موازی high-level synthesis - سنتز سطح بالا FPGA - مدار مجتمع دیجیتال برنامه‌پذیر Data center - مرکز داده یا دیتاسنتر GPU - واحد پردازش گرافیکی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

Design space exploration of multi-core RTL via high level synthesis from OpenCL models

چکیده انگلیسی

As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of available computing power. Designing optimized accelerators that can meet particular requirements has always presented a tremendous challenge to hardware engineers. To do so, designers have to trade off performance for power consumption in a manner such that the final RTL consumes minimum energy to meet the required performance (e.g. FLOPS) target. Moreover, the growing trend towards heterogeneous platforms is crucial to meet time and power consumption constraints of high-performance computing (HPC) applications. The OpenCL parallel programming language and framework enables programming CPU, GPU and recently FPGAs using the high-level synthesis (HLS) methodology. This work presents a design space exploration flow based on execution time, resource utilization and power consumption of OpenCL kernels mapped on FPGAs using the Xilinx high-level synthesis tool chain. Our experiments suggest that the quality of generated solutions, in terms of performance-per-watt, can be determined using analytical formulas prior to implementation, thus enabling fast and accurate DSE by considering on-chip and off-chip sources of parallelism. Moreover, the automated flow suggests design hints to meet a given time constraint within available resources. The proposed technique is demonstrated by optimizing the well known bitonic sorting network from NVIDIA's OpenCL benchmark. Our results report that FPGAs have at least 20% higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology (28â¯nm). Additionally, FPGAs with more available resources and using a more modern process (20â¯nm) can outperform the tested GPUs while consuming at least 55% less power at the cost of more expensive devices.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Microprocessors and Microsystems - Volume 63, November 2018, Pages 199-208

نویسندگان

Mehdi Roozmeh, Luciano Lavagno,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Design space exploration of multi-core RTL via high level synthesis from OpenCL models

دسترسی سریع

ارتباط

English Website