کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523829 868503 2016 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs
چکیده انگلیسی


• We improve performance of fine-grain UPC applications by orders of magnitude.
• We introduce a novel shared-data localization transformation.
• We present a thorough performance analysis and evaluation.
• We show that reducing run-time calls is crucial for performance.
• We achieve performance comparable to C and MPI using the UPC programming model.

Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector-executor transformation results in excessive instrumentation that hinders performance.This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 × their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 × for applications with irregular accesses.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 54, May 2016, Pages 2–14
نویسندگان
, , , , ,