کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432743 689058 2013 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Avoiding disruptive failovers in transaction processing systems with multiple active nodes
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Avoiding disruptive failovers in transaction processing systems with multiple active nodes
چکیده انگلیسی

We present a highly available system for environments such as stock trading, where high request rates and low latency requirements dictate that service disruption on the order of seconds in length can be unacceptable. After a node failure, our system avoids delays in processing due to detecting the failure or transferring control to a back-up node. We achieve this by using multiple primary nodes which process transactions concurrently as peers. If a primary node fails, the remaining primaries continue executing without being delayed at all by the failed primary. Nodes agree on a total ordering for processing requests with a novel low overhead wait-free algorithm that utilizes a small amount of shared memory accessible to the nodes and a simple compare-and-swap like protocol which allows the system to progress at the speed of the fastest node. We have implemented our system on an IBM z990 zSeries eServer mainframe and show experimentally that our system performs well and can transparently handle node failures without causing delays to transaction processing. The efficient implementation of our algorithm for ordering transactions is a critically important factor in achieving good performance.


► We design a novel total ordering algorithm using shared memory.
► We combine our total ordering algorithm with a primary–primary architecture to achieve non-disruptive failover.
► We implement our algorithm on a real system and demonstrate its feasibility in practice.
► We quantify the performance overhead of our total ordering algorithm.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 5, May 2013, Pages 630–640
نویسندگان
, ,