کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396796 670595 2015 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Clustering binary cube dimensions to compute relaxed GROUP BY aggregations
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Clustering binary cube dimensions to compute relaxed GROUP BY aggregations
چکیده انگلیسی

Computing cube aggregations on large data sets with high dimensionality is a crucial and costly task that normally requires multiple passes on the input table. This task gets harder when the number of result groups increases due to a large number of combinations of dimension values. In this research, we focus on reducing the number of aggregations and providing a more succinct result by deriving aggregations on top of groups with similar records exploiting an efficient binary clustering of the fact table, which can be viewed as a relaxation of traditional OLAP cubes. We present an efficient window-based Incremental K-Means algorithm implemented in a DBMS as a user-defined function. A significant speedup is achieved through sufficient statistics, multithreading, efficient distance computation and sparse matrix operations. Our algorithm performance is experimentally compared against multiple variants of the K-Means algorithm. We show our incremental K-Means algorithm achieves similar or better results much faster than the traditional K-Means algorithm. Moreover, we show interesting aggregations can be efficiently obtained using the cluster identifier as a new cube dimension.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 53, October–November 2015, Pages 41–59
نویسندگان
, ,