Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
395756 | Information Sciences | 2008 | 8 Pages |
Abstract
Checkpointing and rollback recovery are established techniques for handling failures in distributed systems. Under synchronous checkpointing, each process involved in the distributed computation takes checkpoint almost simultaneously. This causes contention for network stable storage and hence degrades performance as processes may have to wait for long time for the checkpointing operation to complete. In this paper, we propose a staggered quasi-synchronous checkpointing algorithm which reduces contention for network stable storage without any synchronization overhead.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
D. Manivannan, Q. Jiang, Jianchang Yang, M. Singhal,