Article ID Journal Published Year Pages File Type
379213 Data & Knowledge Engineering 2006 36 Pages PDF
Abstract

The K-Closest-Pairs Query (K-CPQ), a type of distance join in spatial databases, discovers the K pairs of objects formed from two different datasets with the K smallest distances. Recently, branch-and-bound algorithms based on R-trees have been developed in order to answer K-CPQs efficiently. For query optimization purposes, analytical models are needed to estimate the processing cost of a specific query in order to evaluate alternative execution plans. In this paper, we combine techniques that have been used for the analysis of nearest neighbor and spatial join queries, and derive the performance cost (in terms of disk accesses) of K-CPQs using R-trees. Moreover, we present two interesting extensions of the cost model for K-CPQs, one exploiting the buffering management using R-trees and another for a second type of distance join, the so-called buffer queries. The proposed cost models are verified under a variety of distributions in 2-dimensional space on both synthetic and real datasets, shown to achieve accurate estimations of the measured experimental results.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,