Article ID Journal Published Year Pages File Type
10325409 Information Systems 2005 17 Pages PDF
Abstract
In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,