کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523991 868540 2015 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Communication-aware process and thread mapping using online communication detection
ترجمه فارسی عنوان
روند ارتباط آگاهانه و نقشه برداری موضوع با استفاده از تشخیص ارتباط آنلاین
کلمات کلیدی
حافظه مشترک، برنامه های کاربردی موازی، بهینه سازی ارتباط، نقشه برداری
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We perform online detection of inter-process and inter-thread communication.
• Detected communication pattern is used to migrate processes and threads.
• Operating System-based mechanism, no changes to applications or runtime libraries.
• We reduce execution time and energy consumption.
• Evaluation on shared memory machines and a cluster show substantial improvements.

The rising complexity of memory hierarchies and interconnections in parallel shared memory architectures leads to differences in the communication performance. These differences can be exploited to perform a communication-aware mapping of parallel applications to the hardware topology, improving their performance and energy efficiency. To perform the mapping, it is necessary to determine the communication behavior of the processes and threads of the application. Previous methods rely on static communication traces to detect communication, require hardware changes or support only a subset of parallelization models.We propose CDSM, Communication Detection in Shared Memory, a mechanism that detects communication in from page faults and uses this information to perform the mapping. CDSM works on the operating system level during the execution of the parallel application and supports all parallelization models that use shared memory for communication. It does not require modifications to the applications, previous knowledge about their behavior, or changes to the hardware and runtime libraries. Experiments with the MPI, MPI+OpenMP and OpenMP implementations of the NAS parallel benchmarks, the HPCC benchmark and the PARSEC benchmark suite on a shared memory machine show that CDSM has a high detection accuracy with a negligible overhead. Execution time and processor energy consumption were reduced by up to 35.9% and 18.9%, respectively (10.2% and 7.3%, on average). Experiments on a cluster system, where CDSM optimizes the communication within each node, showed an average execution time reduction of 10.4%.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 43, March 2015, Pages 43–63
نویسندگان
, , , , ,