کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
11032903 1645042 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Design space exploration of multi-core RTL via high level synthesis from OpenCL models
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
Design space exploration of multi-core RTL via high level synthesis from OpenCL models
چکیده انگلیسی
As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of available computing power. Designing optimized accelerators that can meet particular requirements has always presented a tremendous challenge to hardware engineers. To do so, designers have to trade off performance for power consumption in a manner such that the final RTL consumes minimum energy to meet the required performance (e.g. FLOPS) target. Moreover, the growing trend towards heterogeneous platforms is crucial to meet time and power consumption constraints of high-performance computing (HPC) applications. The OpenCL parallel programming language and framework enables programming CPU, GPU and recently FPGAs using the high-level synthesis (HLS) methodology. This work presents a design space exploration flow based on execution time, resource utilization and power consumption of OpenCL kernels mapped on FPGAs using the Xilinx high-level synthesis tool chain. Our experiments suggest that the quality of generated solutions, in terms of performance-per-watt, can be determined using analytical formulas prior to implementation, thus enabling fast and accurate DSE by considering on-chip and off-chip sources of parallelism. Moreover, the automated flow suggests design hints to meet a given time constraint within available resources. The proposed technique is demonstrated by optimizing the well known bitonic sorting network from NVIDIA's OpenCL benchmark. Our results report that FPGAs have at least 20% higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology (28 nm). Additionally, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming at least 55% less power at the cost of more expensive devices.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Microprocessors and Microsystems - Volume 63, November 2018, Pages 199-208
نویسندگان
, ,