Article ID Journal Published Year Pages File Type
395695 Information Sciences 2006 33 Pages PDF
Abstract

The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1 × 3 chunks, although we find an exact algorithm for 1 × 2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(dn log(n)) for data cubes of size nd. When dimensions are not independent, we propose and evaluate a several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19–30% more efficient than ROLAP, but normalization can improve it further by 9–13% for a total gain of 29–44% over ROLAP.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,