Rules-based self Support Operation in Complex Infrastructures

Article ID	Journal	Published Year	Pages	File Type
492720	Procedia Technology	2014	6 Pages	PDF

Abstract

High availability in complex distributed systems is a challenge that has to be considered not only at design time, but also at execution time. This paper proposes an automatized monitoring architecture, event driven, based in rules with the goal of minimizing systems unavailability. Translating the expertise that operators have to rules, and providing an appropriate interface for services operation, allow us to execute self-healing actions to anticipate or correct fault cases as quickly as possible.