کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
396796 | 670595 | 2015 | 19 صفحه PDF | دانلود رایگان |

Computing cube aggregations on large data sets with high dimensionality is a crucial and costly task that normally requires multiple passes on the input table. This task gets harder when the number of result groups increases due to a large number of combinations of dimension values. In this research, we focus on reducing the number of aggregations and providing a more succinct result by deriving aggregations on top of groups with similar records exploiting an efficient binary clustering of the fact table, which can be viewed as a relaxation of traditional OLAP cubes. We present an efficient window-based Incremental K-Means algorithm implemented in a DBMS as a user-defined function. A significant speedup is achieved through sufficient statistics, multithreading, efficient distance computation and sparse matrix operations. Our algorithm performance is experimentally compared against multiple variants of the K-Means algorithm. We show our incremental K-Means algorithm achieves similar or better results much faster than the traditional K-Means algorithm. Moreover, we show interesting aggregations can be efficiently obtained using the cluster identifier as a new cube dimension.
Journal: Information Systems - Volume 53, October–November 2015, Pages 41–59