کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
484069 703253 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Portable Application-level Checkpointing for Hybrid MPI-OpenMP Applications
چکیده انگلیسی

As parallel machines increase their number of processors, so does the failure rate of the global system, thus, long-running applications will need to make use of fault tolerance techniques to ensure the successful execution completion. Most of current HPC systems are built as clusters of multicores. The hybrid MPI-OpenMP paradigm provides numerous benefits on these systems. This paper presents a checkpointing solution for hybrid MPI-OpenMP applications, in which checkpoint consistency is guaranteed by using a coordination protocol intra-node, while no inter-node coordination is needed. The proposal reduces network utilization and storage resources in order to optimize the I/O cost of fault tolerance, while minimizing the checkpointing overhead. Besides, the portability of the solution and the dynamic parallelism provided by OpenMP enable the restart of the applications using machines with different architectures, operating systems and/or number of cores, adapting the number of running OpenMP threads for the best exploitation of the available resources. Extensive evaluation using hybrid MPI-OpenMP applications from the ASC Sequoia Benchmark Codes and NERSC-8/Trinity benchmarks is presented, showing the effectiveness and efficiency of the approach.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 80, 2016, Pages 19–29
نویسندگان
, , , ,