Article ID Journal Published Year Pages File Type
428606 Information Processing Letters 2011 6 Pages PDF
Abstract

To minimize the expected execution time, a general checkpoint scheduling algorithm is proposed to determine the near optimal checkpointing time sequence. More precisely, based on a simple timing policy, an execution analytical model is introduced and the expected effective ratio is derived. By maximizing the expected effective ratio, the optimal checkpoint period for the exponential failure distribution can be obtained directly, and a general checkpoint scheduling algorithm is developed to perform the near optimal checkpointing time sequence for an arbitrary failure distribution. Experimental results reveal that the proposal can perform varying checkpoint interval according to the failure distribution and the expected effective ratio of the execution is considerable for the long-running application in term of reliability.

► A mathematical model is proposed to analyze the optimal checkpointing sequence. ► The effective ratio of a long-running application is defined and derived. ► A checkpoint scheduling algorithm is developed based on the mathematical model. ► The failure distribution instance is discussed to educe a conclusion. ► The final expected effective ratio of the execution is considerable for reliability.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , ,