Reducing the Multidimensionality of OLAP Cubes with Genetic Algorithms and Multiple Correspondence Analysis

Article ID	Journal	Published Year	Pages	File Type
484411	Procedia Computer Science	2015	8 Pages	PDF

Abstract

A data warehouse is an information system used for data reporting and analysis. It results from the collection, pretreatment and storage of massive integrated data from disparate and heterogeneous sources. Data warehouses contain historical information and constitute a decision support system. When the number of dimensions constituting the data warehouse is too important, it may be difficult for the analyst to explore the OLAP cubes looking for knowledge and well-hidden patterns. It is then useful to provide him with convenient tools to identify and ignore superfluous dimensions that can be deleted without loosing important information. This solution allows creating new parsimonious OLAP cubes instead of the sparse original ones. In this context, we propose in this paper an approach based on Genetic Algorithms (GAs) associated with the Multiple Correspondence Analysis (MCA) method. Our approach aims to identify a subset of n superfluous dimensions from the initial set of p dimensions where n