A scalable infrastructure for the performance analysis of passive target synchronization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
523844	868506	2013	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

One-sided communication Performance analysis - تجزیه و تحلیل عملکرد Event tracing - ردیابی رویداد

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

A scalable infrastructure for the performance analysis of passive target synchronization

چکیده انگلیسی

Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still in its infancy. Some of the performance problems for which tool support is needed occur at the level of the underlying one-sided communication substrate, such as the Aggregate Remote Memory Copy Interface (ARMCI). One such example is the waiting time in situations where asynchronous data transfers cannot be completed without software intervention at the target side. This is not uncommon on systems with reduced operating-system kernels such as IBM Blue Gene/P where the use of progress threads would double the number of cores necessary to run an application. In this paper, we present an extension of the Scalasca trace-analysis infrastructure aimed at the identification and quantification of progress-related waiting times at larger scales. We demonstrate its utility and scalability using a benchmark running with up to 32,768 processes.

► Replay-based trace analysis to support passive target synchronization.
► Novel replay schemes for the efficient exchange of performance relevant information.
► Revealing significant impact of absence of remote progress on performance of one-sided communication in polling scenarios.
► Strong and weak scaling measurements on up to 32,768 processes with application kernels.
► Performance measurement of NWChem simulation on 4,096 processes.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 39, Issue 3, March 2013, Pages 132–145

نویسندگان

Marc-André Hermanns, Sriram Krishnamoorthy, Felix Wolf,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A scalable infrastructure for the performance analysis of passive target synchronization

دسترسی سریع

ارتباط

English Website