کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396532 670371 2013 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Histograms as statistical estimators for aggregate queries
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Histograms as statistical estimators for aggregate queries
چکیده انگلیسی

The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency—this is called uniform distribution assumption. In this paper, we analyze histograms from a statistical point of view. We show that a significantly less restrictive statistical assumption – the elements within a bucket are randomly arranged even though they might have different frequencies – leads to identical formulas for approximating aggregate queries using histograms. Under this assumption, we analyze the behavior of both unidimensional and multidimensional histograms and provide tight error guarantees for the quality of approximations. We conclude that histograms are the best estimators if the assumption holds; sampling and sketching are significantly worse. As an example of how the statistical theory of histograms can be extended, we show how XSketches – an approximation technique for XML queries that uses histograms as building blocks – can be statistically analyzed. The combination of the random shuffling assumption and the other statistical assumptions associated with XSketch estimators ensures a complete statistical model and error analysis for XSketches.


► We formulate a random shuffling assumption to characterize histograms.
► We provide tight error guarantees for histograms when the random shuffling assumption holds.
► We prove histograms are the bests estimator for aggregation when the assumption holds.
► We explain why and when histograms behave well as approximators.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 38, Issue 2, April 2013, Pages 213–230
نویسندگان
, ,