Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6873311 | Future Generation Computer Systems | 2018 | 24 Pages |
Abstract
Monitoring is the key to guarantee the reliability of cloud computing systems. By analyzing monitoring data, administrators can understand systems' statuses to detect, diagnose and solve problems. However, due to the enormous scale and complex structure of cloud computing, a monitoring system should collect, transfer, store and process a large amount of monitoring data, which brings a significant performance overhead and increases the difficulty of analyzing useful information. To address the above issue, this paper proposes a self-adaptive monitoring approach for cloud computing systems. First, we conduct correlation analysis between different metrics, and monitor selected important ones which represent the others and reflect the running status of a system. Second, we characterize the running status with Principal Component Analysis (PCA), estimate the anomaly degree, and predict the possibility of faults. Finally, we dynamically adjust the monitoring period based on the estimated anomaly degree and a reliability model. To evaluate our proposal, we have applied the approach in our open-source TPC-W benchmark Bench4Q deployed in our real cloud computing platform OnceCloud. The experimental results demonstrate that our approach can adapt to dynamic workloads, accurately estimate the anomaly degree, and automatically adjust monitoring periods. Thus, the approach can effectively improve the accuracy and timeliness of anomaly detection in an abnormal status, and efficiently lower the monitoring overhead in a normal status.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Tao Wang, Jiwei Xu, Wenbo Zhang, Zeyu Gu, Hua Zhong,