Article ID Journal Published Year Pages File Type
6853972 Data & Knowledge Engineering 2017 40 Pages PDF
Abstract
In traditional OLAP systems, the ETL process loads all available data in the data warehouse before users start querying them. In some cases, this may be either inconvenient (because data are supplied from a provider for a fee) or unfeasible (because of their size); on the other hand, directly launching each analysis query on source data would not enable data reuse, leading to poor performance and high costs. The alternative investigated in this paper is that of fetching and storing data on-demand, i.e., as they are needed during the analysis process. In this direction we propose the Query-Extract-Transform-Load (QETL) paradigm to feed a multidimensional cube; the idea is to fetch facts from the source data provider, load them into the cube only when they are needed to answer some OLAP query, and drop them when some free space is needed to load other facts. Remarkably, QETL includes an optimization step to cheaply extract the required data based on the specific features of the data provider. The experimental tests, made on a real case study in the genomics area, show that QETL effectively reuses data to cut extraction costs, thus leading to significant performance improvements.
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,