Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
492720 | Procedia Technology | 2014 | 6 Pages |
Abstract
High availability in complex distributed systems is a challenge that has to be considered not only at design time, but also at execution time. This paper proposes an automatized monitoring architecture, event driven, based in rules with the goal of minimizing systems unavailability. Translating the expertise that operators have to rules, and providing an appropriate interface for services operation, allow us to execute self-healing actions to anticipate or correct fault cases as quickly as possible.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)