Self-adaptive cloud monitoring with online anomaly detection

Article ID	Journal	Published Year	Pages	File Type
6873311	Future Generation Computer Systems	2018	24 Pages	PDF

Abstract

Monitoring is the key to guarantee the reliability of cloud computing systems. By analyzing monitoring data, administrators can understand systems' statuses to detect, diagnose and solve problems. However, due to the enormous scale and complex structure of cloud computing, a monitoring system should collect, transfer, store and process a large amount of monitoring data, which brings a significant performance overhead and increases the difficulty of analyzing useful information. To address the above issue, this paper proposes a self-adaptive monitoring approach for cloud computing systems. First, we conduct correlation analysis between different metrics, and monitor selected important ones which represent the others and reflect the running status of a system. Second, we characterize the running status with Principal Component Analysis (PCA), estimate the anomaly degree, and predict the possibility of faults. Finally, we dynamically adjust the monitoring period based on the estimated anomaly degree and a reliability model. To evaluate our proposal, we have applied the approach in our open-source TPC-W benchmark Bench4Q deployed in our real cloud computing platform OnceCloud. The experimental results demonstrate that our approach can adapt to dynamic workloads, accurately estimate the anomaly degree, and automatically adjust monitoring periods. Thus, the approach can effectively improve the accuracy and timeliness of anomaly detection in an abnormal status, and efficiently lower the monitoring overhead in a normal status.

Keywords

Correlation analysis Anomaly detection Cloud computing Adaptive Monitoring