The optimization for recurring queries in big data analysis system with MapReduce

Article ID	Journal	Published Year	Pages	File Type
6872942	Future Generation Computer Systems	2018	18 Pages	PDF

Abstract

As data-intensive cluster computing systems like MapReduce grow in popularity, there is a strong need to promote the efficiency. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving data-intensive workloads, have become a bedrock component in big data analytic applications. Consequently, this paper presents optimization strategies for recurring queries for big data analysis. Firstly, it analyzes the impact of recurring queries efficiency by MapReduce recurring queries model. Secondly, it proposes the MapReduce consistent window slice algorithm, which can not only create more opportunities for reuse of recurring queries, but also greatly reduce redundant data while loading input data by the fine-grained scheduling. Thirdly, in terms of data scheduling, it designs the MapReduce late scheduling strategy that improve data processing and optimize computation resource scheduling in MapReduce cluster. Finally, it constructs the efficient data reuse execution plans by MapReduce recurring queries reuse strategy. The experimental results on a variety of workloads show that the algorithms outperform the state-of-the-art approaches.

Keywords

Data reuse MapReduce Big Data