Article ID Journal Published Year Pages File Type
4956435 Journal of Systems and Software 2017 28 Pages PDF
Abstract
Large information systems comprise different interconnected hardware and software components, that collectively generate large volumes of data. Furthermore, the run-time analysis of such data involves computationally expensive algorithms, and is pivotal to a number of software engineering activities such as, system understanding, diagnostics, and root cause analysis. In a quest to increase the performance of run-time analysis for large sets of logged data, we present an approach that allows for the real time reduction of one or more event streams by utilizing a set of filtering criteria. More specifically, the approach employs a similarity measure that is based on information theory principles, and is applied between the features of the incoming events, and the features of a set of retrieved or constructed events, that we refer to as beacons. The proposed approach is domain and event schema agnostic, can handle infinite data streams using a caching algorithm, and can be parallelized in order to tractably process high volume, high frequency, and high variability data. Experimental results obtained using the KDD'99 and CTU-13 labeled data sets, indicate that the approach is scalable, and can yield highly reduced sets with high recall values with respect to a use case.
Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,