Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
428993 | Information Processing Letters | 2012 | 6 Pages |
In this paper, a checkpointing protocol based on loose synchronization is proposed. The protocol enables processes to take checkpoints at different frequencies so that each process can control its rollback distance. In traditional asynchronous and quasi-synchronous checkpointing protocols, the checkpoints that are not up-to-date may be used for recovery. As a result, the rollback distance is often difficult to control. In the proposed protocol, the checkpoint cycle of each process is dynamically adjusted using a pessimistic scheme so that strict 1-rollback is achieved; namely, one of the last two checkpoints of each process can be utilized for recovery.
► We propose a pessimistic multi-cycle checkpointing protocol that allows processes to take checkpoints with different frequencies. ► We have proven that one of the last two checkpoints of each process can be utilized during recovery. ► We present a mathematical analysis of the checkpointing overhead.