Article ID Journal Published Year Pages File Type
395756 Information Sciences 2008 8 Pages PDF
Abstract

Checkpointing and rollback recovery are established techniques for handling failures in distributed systems. Under synchronous checkpointing, each process involved in the distributed computation takes checkpoint almost simultaneously. This causes contention for network stable storage and hence degrades performance as processes may have to wait for long time for the checkpointing operation to complete. In this paper, we propose a staggered quasi-synchronous checkpointing algorithm which reduces contention for network stable storage without any synchronization overhead.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,