Article ID Journal Published Year Pages File Type
4950176 Future Generation Computer Systems 2017 29 Pages PDF
Abstract
Cloud datacenters host hundreds of thousands of physical servers that offer computing resources for executing customer jobs. While the failures of these physical machines are considered normal rather than exceptional, in large-scale distributed systems and cloud datacenters evaluation of availability in a datacenter is essential for both cloud providers and customers. Although providing a highly available and reliable computing infrastructure is essential to maintaining customer confidence, cloud providers desire to have highly utilized datacenters to increase the profit level of delivered services. Cloud computing architectural solutions should thus take into consideration both high availability for customers and highly utilized resources to make delivering services more profitable for cloud providers. This paper presents a highly reliable cloud architecture by leveraging the 80/20 rule. This architecture uses the 80/20 rule (80% of cluster failures come from 20% of physical machines) to identify failure-prone physical machines by dividing each cluster into reliable and risky sub-clusters. Furthermore, customer jobs are divided into latency-sensitive and latency-insensitive types. The results showed that only about 1% of all requested jobs are extreme latency-sensitive and require availability of 99.999%. By offering services to revenue-generating jobs, which are less than 50% of all requested jobs, within the reliable subcluster of physical machines, cloud providers can make their businesses more profitable by preventing service level agreement violation penalties and improving their reputations.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,