کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
490490 707499 2013 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Achieving Checkpointing Global Consistency Through a Hybrid Compile Time and Runtime Protocol
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Achieving Checkpointing Global Consistency Through a Hybrid Compile Time and Runtime Protocol
چکیده انگلیسی

The execution times of large-scale parallel applications on modern multi/many-core systems are usually longer than their mean time between failures. Therefore, parallel applications must tolerate hardware failures to ensure that not all computation is lost on machine failures. Checkpointing and rollback recovery are very useful techniques to implement fault-tolerant applications. In parallel applications a checkpointing protocol is required to guarantee that individual checkpoints form a consistent global state. Coordinated approaches are the most popular solution to achieve global checkpointing consistency. However, their main drawback is their poor scalability due to the required runtime coordination. This work presents a new hybrid protocol that combines the detection of valid recovery lines at compile time with a light and asynchronous protocol at runtime to negotiate the closest valid recovery line. Experimental results prove the efficiency and scalability of the proposal.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 18, 2013, Pages 169-178