کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
424576 685592 2015 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Picos: A hardware runtime architecture support for OmpSs
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Picos: A hardware runtime architecture support for OmpSs
چکیده انگلیسی


• Picos, a novel hardware dataflow-based task scheduler, is described.
• Latency information of each Picos module, based on the synthesis of the VHDL code.
• We perform a design space exploration of Picos for different computing system sizes.
• Very fine-grained (Pico) tasks are efficiently executed with this hardware support.
• Real application workloads are tested and compared to the software alternative.

OmpSs is a programming model that provides a simple and powerful way of annotating sequential programs to exploit heterogeneity and task parallelism based on runtime data dependency analysis, dataflow scheduling and out-of-order task execution; it has greatly influenced Version 4.0 of the OpenMP standard. The current implementation of OmpSs achieves those capabilities with a pure-software runtime library: Nanos++. Therefore, although powerful and easy to use, the performance benefits of exploiting fine-grained (pico) task parallelism are limited by the software runtime overheads. To overcome this handicap we propose Picos, an implementation of the Task Superscalar (TSS) architecture that provides hardware support to the OmpSs programming model. Picos is a novel hardware dataflow-based task scheduler that dynamically analyzes inter-task dependencies and identifies task-level parallelism at run-time. In this paper, we describe the Picos Hardware Design and the latencies of the main functionality of its components, based on the synthesis of their VHDL design. We have implemented a full cycle-accurate simulator based on those latencies to perform a design exploration of the characteristics and number of its components in a reasonable amount of time. Finally, we present a comparison of the Picos and Nanos++ runtime performance scalability with a set of real benchmarks. With Picos, a programmer can achieve ideal scalability using aggressive parallel strategies with a large number of fine granularity tasks.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 53, December 2015, Pages 130–139
نویسندگان
, , , , ,