کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
431805 | 688630 | 2007 | 14 صفحه PDF | دانلود رایگان |

If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detecting and correcting, checkpointing, and recovery algorithms are proposed in a ring topology. The proposed data fault detection and correction algorithms can handle data faults; at most one per process, but in any number of processes. The proposed checkpointing algorithm can deal with concurrent multiple initiations of checkpointing and data faults. A process can recover from a fault, using the proposed recovery algorithm in spite of multiple data faults present in the system. All the proposed algorithms converge in O(n) steps, where n is the number of processes. The algorithm can be extended to work for general topologies too.
Journal: Journal of Parallel and Distributed Computing - Volume 67, Issue 7, July 2007, Pages 816-829