Article ID Journal Published Year Pages File Type
6872942 Future Generation Computer Systems 2018 18 Pages PDF
Abstract
As data-intensive cluster computing systems like MapReduce grow in popularity, there is a strong need to promote the efficiency. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving data-intensive workloads, have become a bedrock component in big data analytic applications. Consequently, this paper presents optimization strategies for recurring queries for big data analysis. Firstly, it analyzes the impact of recurring queries efficiency by MapReduce recurring queries model. Secondly, it proposes the MapReduce consistent window slice algorithm, which can not only create more opportunities for reuse of recurring queries, but also greatly reduce redundant data while loading input data by the fine-grained scheduling. Thirdly, in terms of data scheduling, it designs the MapReduce late scheduling strategy that improve data processing and optimize computation resource scheduling in MapReduce cluster. Finally, it constructs the efficient data reuse execution plans by MapReduce recurring queries reuse strategy. The experimental results on a variety of workloads show that the algorithms outperform the state-of-the-art approaches.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,