Article ID Journal Published Year Pages File Type
396679 Information Systems 2015 16 Pages PDF
Abstract

•We substantially improve the space and time complexity of the Quickjoin algorithm.•We show how to select good pivots for Quickjoin.•We generalize the Quickjoin algorithm to handle k-nearest neighbor joins.

We consider the join operation in metric spaces. Given two sets A and B   of objects drawn from some universe UU, we want to compute the set A⋈B={(a,b)∈A×B|d(a,b)≤r} efficiently, where d:U×U→R+d:U×U→R+ is a metric distance function and r∈R+r∈R+ is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm [6], inspired by the well-known Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; (iii) making the algorithm probabilistic while still obtaining most of the relevant results; and (iv) improving the (worst case) work space needed, from O(n2)O(n2) to O(logn), where n is the size of the dataset(s). We also discuss pivot selection, show how to adapt Quickjoin to compute k-nearest neighbor joins and sketch a parallel version of the algorithm. The experimental results show that the method works well in practice.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,