کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425438 685738 2006 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Reliability challenges in large systems
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Reliability challenges in large systems
چکیده انگلیسی

Clusters built from commodity PCs dominate high-performance computing today, with systems containing thousands of processors now being deployed. As node counts for multi-teraflop systems grow to tens of thousands, with proposed petaflop system likely to contain hundreds of thousands of nodes, the assumption of fully reliable hardware and software becomes much less credible. In this paper, after presenting examples and experimental data that quantify the reliability of current systems, we describe possible approaches for effective system use. In particular, we present techniques for detecting imminent failures in the environment and that allow an application to run successfully despite such failures. We also show how intelligent and adaptive software can lead to failure resilience and efficient system usage.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 22, Issue 3, February 2006, Pages 293–302
نویسندگان
, , ,