On termination detection in crash-prone distributed systems with failure detectors

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
432540	688935	2008	21 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

On termination detection in crash-prone distributed systems with failure detectors

چکیده انگلیسی

We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm AA that has been designed for a failure-free environment into a termination detection algorithm BB that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash.Let μ(n,M)μ(n,M) and δ(n,M)δ(n,M) denote the message complexity and detection latency, respectively, of AA when the system has nn processes and the underlying computation exchanges MM application messages. The message complexity of BB is O(n+μ(n,0))O(n+μ(n,0)) messages per failure more than the message complexity of AA. Also, its detection latency is O(δ(n,0))O(δ(n,0)) per failure more than that of AA. Furthermore, application message size increases by at most log(f+1)log(f+1) bits, where ff is the actual number of processes that fail during an execution. We show that, when the communication topology is fully connected, under certain realistic assumption, any fault-tolerant termination detection algorithm can be forced to exchange Ω(nf)Ω(nf) control messages in the worst-case even when at most one process may be active initially and the underlying computation does not exchange any application messages. This implies that our transformation is optimal in terms of message complexity when μ(n,0)=O(n)μ(n,0)=O(n).The fault-tolerant termination detection algorithm resulting from the transformation satisfies three desirable properties. First, it can tolerate the failure of up to n−1n−1 processes. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash. Third, it does not block the application at any time. Further, using our transformation, we derive a fault-tolerant termination detection algorithm that is the most efficient fault-tolerant termination detection algorithm that has been proposed so far to our knowledge. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 68, Issue 6, June 2008, Pages 855–875

نویسندگان

Neeraj Mittal, Felix C. Freiling, S. Venkatesan, Lucia Draque Penso,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

On termination detection in crash-prone distributed systems with failure detectors

دسترسی سریع

ارتباط

English Website