Article ID Journal Published Year Pages File Type
4956499 Journal of Systems and Software 2017 12 Pages PDF
Abstract
Cloudera Impala, an analytic database system for Apache Hadoop, has a severe problem with query plan generation: the system can only generate query plans in left-deep tree form, which restricts the ability of parallel execution. In this paper, we present a logical query optimization scheme for Impala system. First, an improved McCHyp (MinCutConservative Hypergraph) logical query plan generation algorithm is proposed for Impala system. It can reduce the plan generation time by introducing a pruning strategy. Second, a new cost model that takes the characteristics of Impala system into account is proposed. Finally, Impala system is extended to support query plans in bushy tree form by integrating the plan generation algorithm. We evaluated our scheme using TPC-DS test suit. Experimental results show that the extended Impala system generally performs better than the original system, and the improved plan generation algorithm has less execution time than McCHyp. In addition, our cost model fits better for Impala system, which supports query plans in bushy tree form.
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , , , , ,