کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432358 | 688865 | 2014 | 17 صفحه PDF | دانلود رایگان |
• We have improved the text in many places, based upon all the suggestions.
• We have added a new set of simulations based upon actual failure traces.
• We have added a new set of simulations to deal with inaccurate prediction dates.
This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide optimal algorithms to decide whether and when to take predictions into account, and we derive the optimal value of the checkpointing period. These results allow us to analytically assess the key parameters that impact the performance of fault predictors at very large scale.
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 2, February 2014, Pages 2048–2064