Article ID Journal Published Year Pages File Type
553680 Decision Support Systems 2011 13 Pages PDF
Abstract

Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q-Histogram, based on the use of the quad-tree, which is a popular index structure for multidimensional data sets. The use of the compact representation of the target data obtainable from the quad-tree allows a fast construction of a histogram with the minimum number of scanning, i.e., only one scanning, of the underlying data. In addition to the advantage of computation time, the proposed method also provides a better performance than other existing methods with respect to the quality of selectivity estimation. We present a new measure of data skew for a histogram bucket, called the weighted bucket skew. Then, we provide an effective technique for skew-tolerant organization of histograms. Finally, we compare the accuracy and efficiency of the proposed method with other existing methods using both real-life data sets and synthetic data sets. The results of experiments show that the proposed method generally provides a better performance than other existing methods in terms of accuracy as well as computational efficiency.

Research highlights► Histograms can be useful in estimating the selectivity of range queries.► A new multidimensional histogram method based on the use of the quad-tree.► The use of the quad-tree can allow a fast construction of a histogram.► The proposed method generally shows better performance than other existing methods.

Related Topics
Physical Sciences and Engineering Computer Science Information Systems
Authors
, , , ,