کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
379264 659282 2008 22 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Extracting k most important groups from data efficiently
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Extracting k most important groups from data efficiently
چکیده انگلیسی

We study an important data analysis operator, which extracts the k most important groups from data (i.e., the k groups with the highest aggregate values). In a data warehousing context, an example of the above query is “find the 10 combinations of product-type and month with the largest sum of sales”. The problem is challenging as the potential number of groups can be much larger than the memory capacity. We propose on-demand methods for efficient top-k groups processing, under limited memory size. In particular, we design top-k groups retrieval techniques for three representative scenarios as follows. For the scenario with data physically ordered by measure, we propose the write-optimized multi-pass sorted access algorithm (WMSA), that exploits available memory for efficient top-k groups computation. Regarding the scenario with unordered data, we develop the recursive hash algorithm (RHA), which applies hashing with early aggregation, coupled with branch-and-bound techniques and derivation heuristics for tight score bounds of hash partitions. Next, we design the clustered groups algorithm (CGA), which accelerates top-k groups processing for the case where data is clustered by a subset of group-by attributes. Extensive experiments with real and synthetic datasets demonstrate the applicability and efficiency of the proposed algorithms.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 66, Issue 2, August 2008, Pages 289–310
نویسندگان
, , ,