کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10330118 685743 2005 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A Channel Memory based fault tolerance for MPI applications
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
A Channel Memory based fault tolerance for MPI applications
چکیده انگلیسی
Fault tolerant message passing environments protect parallel applications against node failures. Very large scale computing systems, ranging from large clusters to worldwide Global Computing systems, require a high level of fault tolerance in order to efficiently run parallel applications. The Channel Memory approach provides the infrastructure for scalable tolerance to simultaneous faults. Along with a specially designed checkpointing system and recovery protocol, this approach has resulted in the MPICH-V architecture. In this paper, we describe CMDE - a stand-alone distributed program system based on MPICH-V architecture and implementing an approach to tolerate faults of Channel Memories.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 21, Issue 5, May 2005, Pages 709-715
نویسندگان
, ,