کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523860 868508 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
NestedMP: Enabling cache-aware thread mapping for nested parallel shared memory applications
چکیده انگلیسی


• Task-core mapping schemas for nested-parallel applications may affect performance.
• NestedMP allows programmers to declare number of threads for parallel branches.
• NestedMP's runtime is aware of the whole task tree to make locality-aware task-core mapping.
• We have implemented NestedMP based on GCC 4.8.
• Tesing result shows NestedMP improves performance over GCC's OpenMP implementation.

It is beneficial to exploit multiple levels of parallelism for a wide range of applications, because a typical server already has tens of processor cores now. As the number of cores in a computer is increasing rapidly, efficient support of nested parallelism will be more and more important.We observe that different task-core mapping schemas may result significant performance difference because modern HPC servers are NUMA multi-core systems. So it is important to control the task-core mapping for nested parallelism. However, the number of threads management mechanism in current parallel programming models, such as OpenMP, does not provide enough information for runtime systems to make optimized decision. As a result, current nested parallel applications often suffer from suboptimal task-core mapping and get significant performance loss.To address this problem, we propose NestedMP, a set of directives which extends OpenMP. NestedMP specifies the number of threads of each nested parallel branch in a declarative way and allows runtime systems to see the whole picture of task trees to make locality-aware task-core mapping. We have implemented NestedMP in GCC 4.8.2 and tested the performance on a 4-way 8-core SandyBridge server. The result shows NestedMP improves the performance significantly over GCC’s OpenMP implementation.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 51, January 2016, Pages 56–66
نویسندگان
, , ,