کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10332464 687541 2013 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On-line soft error correction in matrix-matrix multiplication
ترجمه فارسی عنوان
اصلاح خطای نرم در خط در ضرب ماتریس
کلمات کلیدی
تحمل خطا مبتنی بر الگوریتم، ضرب ماتریس، جبر خطی تحمل خطا، خطای خطای مبتنی بر الگوریتم،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation results cannot be trusted any more. A well known technique to correct soft errors in matrix-matrix multiplication is algorithm-based fault tolerance (ABFT). While ABFT achieves much better efficiency than triple modular redundancy (TMR) - a traditional general technique to correct soft errors, both ABFT and TMR detect errors off-line after the computation is finished. This paper extends the traditional ABFT technique from off-line to on-line so that soft errors in matrix-matrix multiplication can be detected in the middle of the computation during the program execution and higher efficiency can be achieved by correcting the corrupted computations in a timely manner. Experimental results demonstrate that the proposed technique can correct one error every ten seconds with negligible (i.e. less than 1%) performance penalty over the ATLAS dgemm().
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Computational Science - Volume 4, Issue 6, November 2013, Pages 465-472
نویسندگان
, , , , , ,