Article ID Journal Published Year Pages File Type
395363 Information Sciences 2009 23 Pages PDF
Abstract

A top-k query returns k tuples with the highest (or the lowest) scores from a relation. The score is computed by combining the values of one or more attributes. We focus on top-k queries having monotone linear score functions. Layer-based methods are well-known techniques for top-k query processing. These methods construct a database as a single list of layers. Here, the ith layer has the tuples that can be the top-i tuple. Thus, these methods answer top-k queries by reading at most k layers. Query performance, however, is poor when the number of tuples in each layer (simply, the layer size) is large. In this paper, we propose a new layer-ordering method, called the Partitioned-Layer Index (simply, the PL Index), that significantly improves query performance by reducing the layer size. The PL Index uses the notion of partitioning, which constructs a database as multiple sublayer lists instead of a single layer list subsequently reducing the layer size. The PL Index also uses the convex skyline, which is a subset of the skyline, to construct a sublayer to further reduce the layer size. The PL Index has the following desired properties. The query performance of the PL Index is quite insensitive to the weights of attributes (called the preference vector) of the score function and is approximately linear in the value of k. The PL Index is capable of tuning query performance for the most frequently used value of k by controlling the number of sublayer lists. Experimental results using synthetic and real data sets show that the query performance of the PL Index significantly outperforms existing methods except for small values of k   (say, k⩽9k⩽9).

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,