کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431717 688617 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fault-tolerant scheduling on parallel systems with non-memoryless failure distributions
ترجمه فارسی عنوان
برنامه ریزی تحمل پذیری در سیستم های موازی با توزیع های شکست بدون حافظه
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• If failure rates are decreasing, we prove that makespan and reliability are antagonistic.
• As there is no single optimal solution, we design an algorithm for computing the approximation set of the Pareto front.
• If failure rates are increasing, both makespan and reliability can be optimized at the same time.
• We prove that the scheduling problem can be solved optimally for several common failure distributions, such as Weibull.

As large parallel systems increase in size and complexity, failures are inevitable and exhibit complex space and time dynamics. Most often, in real systems, failure rates are increasing or decreasing over time. Considering non-memoryless failure distributions, we study a bi-objective scheduling problem of optimizing application makespan and reliability. In particular, we determine whether one can optimize both makespan and reliability simultaneously, or whether one metric must be degraded in order to improve the other. We also devise scheduling algorithms for achieving (approximately) optimal makespan or reliability. When failure rates decrease, we prove that makespan and reliability are opposing metrics. In contrast, when failure rates increase, we prove that one can optimize both makespan and reliability simultaneously. Moreover, we show that the largest processing time (LPT) list scheduling algorithm achieves good performance when processors are of uniform speed. The implications of our findings are the accelerated completion and improved reliability of parallel jobs executed across large distributed systems. Finally, we conduct simulations to investigate the impact of failures on the performance, which is done using an actual application of biological sequence comparison.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 5, May 2014, Pages 2411–2422
نویسندگان
, , , ,