کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425912 685951 2014 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fault-tolerant virtual cluster experiments on federated sites using BonFIRE
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Fault-tolerant virtual cluster experiments on federated sites using BonFIRE
چکیده انگلیسی


• A new proposal for a virtual cluster architecture with fault-tolerance for Cloud.
• A new Elasticity Engine that uses the application performance.
• Elasticity of virtual clusters using application performance monitoring.
• Experiment result about using elasticity to fulfill Specific Deadlines Objective.
• Fault-tolerant experiment results using BonFIRE’s federated infrastructure.

The failure of Cloud sites and variability of performance of the virtual machines (VMs) in this environment are two issues that have to be taken into account by software providers. If they want to guarantee the return of the results on time to their customers, their virtual infrastructure must be designed to adapt itself to the new scenario. This is especially critical in compute intensive applications that execute on virtual clusters with a large number of VMs, because they can need hours or days to produce valid results. Changes in the performance could mean longer times to produce results and, probably, higher costs. Site failures usually force to restart from the beginning, losing many computing hours. In this paper we present a fault-tolerant virtual cluster architecture that can tackle with both issues in the context of compute intensive bag-of-tasks applications. It includes an Elasticity Engine that uses the application performance to decide about the enlargement or reduction of the virtual cluster to fulfill the expectations of the final users. The architecture has been tested in three experiments: execution of the application in a multi-site configuration which has shown that it is not suffering from any penalty because of its execution in a distributed environment; an experiment about Specific Deadline Objective where the Elasticity Engine takes decisions about the enlargement of the cluster with new VMs to end the simulation on time; and a fault-tolerance test where one part of a distributed virtual cluster is lost, restoring the application performance on the surviving Cloud site using recovering mechanisms and elasticity rules, without interruption of the service.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 34, May 2014, Pages 17–25
نویسندگان
, , , , ,