کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431623 688597 2016 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Coping with recall and precision of soft error detectors
ترجمه فارسی عنوان
مقابله با فراخوان و دقت آشکارسازهای خطای نرم
کلمات کلیدی
تحمل خطا؛ محاسبات با کارایی بالا؛ فساد داده های خاموش؛ تأیید جزئی؛ فراخوان و دقت؛ Exascale
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• Resilience algorithms to cope with silent errors for HPC applications.
• Characterization of optimal patterns using partial error detectors.
• Imprecise detectors offer limited usefulness.
• Optimization problem is NP-complete with multiple detector types.
• Construction of an FPTAS and a greedy approximation algorithm.

Many methods are available to detect silent errors in high-performance computing (HPC) applications. Each method comes with a cost, a recall (fraction of all errors that are actually detected, i.e., false negatives), and a precision (fraction of true errors amongst all detected errors, i.e., false positives). The main contribution of this paper is to characterize the optimal computing pattern for an application: which detector(s) to use, how many detectors of each type to use, together with the length of the work segment that precedes each of them. We first prove that detectors with imperfect precisions offer limited usefulness. Then we focus on detectors with perfect precision, and we conduct a comprehensive complexity analysis of this optimization problem, showing NP-completeness and designing an FPTAS (Fully Polynomial-Time Approximation Scheme). On the practical side, we provide a greedy algorithm, whose performance is shown to be close to the optimal for a realistic set of evaluation scenarios. Extensive simulations illustrate the usefulness of detectors with false negatives, which are available at a lower cost than the guaranteed detectors.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 98, December 2016, Pages 8–24
نویسندگان
, , , , , ,