Article ID Journal Published Year Pages File Type
492720 Procedia Technology 2014 6 Pages PDF
Abstract

High availability in complex distributed systems is a challenge that has to be considered not only at design time, but also at execution time. This paper proposes an automatized monitoring architecture, event driven, based in rules with the goal of minimizing systems unavailability. Translating the expertise that operators have to rules, and providing an appropriate interface for services operation, allow us to execute self-healing actions to anticipate or correct fault cases as quickly as possible.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)