کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
524052 868548 2014 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Scheduling directives: Accelerating shared-memory many-core processor execution
ترجمه فارسی عنوان
دستورالعمل های زمانبندی: سرعت اجرای اشیاء پردازنده چند هسته ای به اشتراک گذاشته شده
کلمات کلیدی
برنامه ریزی و تقسیم کار، حافظه مشترک، پردازش موازی، موازی بودن دانه دانه وابستگی به داده ها
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Setting: Real PRAM (shared cache) arch; task-oriented prog; ultra-fast task dispatch.
• Task precedence: correctness. We: additional perf. considerations (memory +).
• E.g, Sched. directives for conc.-task replicas: start after start, lockstep w. slack.
• Tests: (1) intuitive need + toy example, (2) simple hardware implementation possible.
• Target communities: parallel algorithm dev, parallel compilers, hardware architects.

We consider many-core processors with a task-graph oriented programming model, whereby scheduling constraints among tasks are decided offline, and are then enforced by the runtime system using dedicated hardware. Here, exposing and beneficially exploiting fine grain data and control parallelism is increasingly important. Therefore, high expressive power for stating such constraints/directives, along with the ability to implement them in fast, simple hardware, is critical for success. In this paper, we focus on the relationship among different duplicable (multi-instance) tasks, which are used to express and exploit data parallelism. We extend the conventional Start-After-Complete (precedence) constraint to also be usable between replicas of different such tasks rather than only between entire tasks, thereby increasing the exposable parallelism. Additionally, we propose the parameterized Start-After-Start constraint, which can be used to control the degree of “lockstep” among multiple such tasks, e.g., in order to improve cache performance when the tasks work on the same data. Also, we briefly describe several additional interesting directives. Finally, we show that the directives can be supported efficiently in hardware. Hypercore, a very efficient CREW PRAM-like shared-cache architecture, which is very challenging because it has extremely fast dispatching for basic constraints, is used in the discussion. However, the new directives have broader applicability. Having shown the possibility of simple implementation and indications of benefit, this motivates further exploration of these directives and their implementation in hardware, as well as their support by programming tools.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 40, Issue 2, February 2014, Pages 90–106
نویسندگان
, ,