کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523844 868506 2013 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A scalable infrastructure for the performance analysis of passive target synchronization
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
A scalable infrastructure for the performance analysis of passive target synchronization
چکیده انگلیسی

Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still in its infancy. Some of the performance problems for which tool support is needed occur at the level of the underlying one-sided communication substrate, such as the Aggregate Remote Memory Copy Interface (ARMCI). One such example is the waiting time in situations where asynchronous data transfers cannot be completed without software intervention at the target side. This is not uncommon on systems with reduced operating-system kernels such as IBM Blue Gene/P where the use of progress threads would double the number of cores necessary to run an application. In this paper, we present an extension of the Scalasca trace-analysis infrastructure aimed at the identification and quantification of progress-related waiting times at larger scales. We demonstrate its utility and scalability using a benchmark running with up to 32,768 processes.


► Replay-based trace analysis to support passive target synchronization.
► Novel replay schemes for the efficient exchange of performance relevant information.
► Revealing significant impact of absence of remote progress on performance of one-sided communication in polling scenarios.
► Strong and weak scaling measurements on up to 32,768 processes with application kernels.
► Performance measurement of NWChem simulation on 4,096 processes.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 39, Issue 3, March 2013, Pages 132–145
نویسندگان
, , ,