کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432684 689033 2015 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Robust network supercomputing with unreliable workers
ترجمه فارسی عنوان
سوپرپرداخت شبکه قوی با کارکنان غیر قابل اعتماد
کلمات کلیدی
الگوریتم های توزیع شده، تحمل خطا، الگوریتم های تصادفی، قابلیت اطمینان، ابررایانه اینترنتی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• We analyze a master–worker model of a distributed system, where workers may be unreliable.
• We propose two failure models in the above system and design algorithms for them.
• We provide complexity for work and number of messages delivered in the execution of the algorithm.
• We also provide a lower bound for work in one of the models.

Internet supercomputing is becoming a powerful tool for harnessing massive amounts of computational resources. However in typical master–worker settings the correctness of the results of the computation crucially relies on the ability of the master to depend on the computation performed by the workers. We consider a distributed system consisting of a master process and a collection of synchronous worker processes that can execute tasks on behalf of the master and that may act nefariously by deliberately returning fallacious results. The master decides on the correctness of the results by assigning the same task to several workers. For such a setting with nn processes and tt tasks we study the problem of collectively performing the tasks under two different failure models: model FaFa, where some fraction ff of workers that are prone to returning arbitrary (e.g., incorrect) results and the probability pp of such faulty behavior are not known a priori   to the master, and model FbFb when these quantities are known to the master. Previous works assume that the number of faulty processes or the probability of a process acting maliciously is known to the master, e.g., as in model FbFb. In this paper this assumption is removed in model FaFa. First, for model FaFa we provide an efficient algorithm-based on the Stopping Rule Algorithm by Dagum et al. (1995)—that can estimate ff and pp with (ϵ,δ)(ϵ,δ)-approximation, for any 0<δ<10<δ<1 and ϵ>0ϵ>0. We also provide a randomized algorithm for detecting the faulty processes for model FaFa. Finally, we provide algorithms to perform tt tasks, with nn workers, in models FaFa and FbFb.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 75, January 2015, Pages 81–92
نویسندگان
, , ,