Article ID Journal Published Year Pages File Type
455727 Computers & Electrical Engineering 2013 19 Pages PDF
Abstract

As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. Fault-tolerant parallel algorithm (FTPA) is an application-level fault-tolerant approach that can achieve fast self-recovery by parallel recomputing. The method of parallelizing the loops has been used to design the parallel recomputing code for FTPA in our prior work.In the present paper, we first propose a new parallel recomputing code design methodology. Second, the parallel recomputing code design methodology is automated by exploring the use of compiler technology. Finally, we evaluate the performance of our approach with five programs on Tianhe-1A. The experimental results show that the parallel recomputing code generated by the new method has a higher efficiency of parallel recomputing.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , ,