Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
396679 | Information Systems | 2015 | 16 Pages |
•We substantially improve the space and time complexity of the Quickjoin algorithm.•We show how to select good pivots for Quickjoin.•We generalize the Quickjoin algorithm to handle k-nearest neighbor joins.
We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe UU, we want to compute the set A⋈B={(a,b)∈A×B|d(a,b)≤r} efficiently, where d:U×U→R+d:U×U→R+ is a metric distance function and r∈R+r∈R+ is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm [6], inspired by the well-known Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; (iii) making the algorithm probabilistic while still obtaining most of the relevant results; and (iv) improving the (worst case) work space needed, from O(n2)O(n2) to O(logn), where n is the size of the dataset(s). We also discuss pivot selection, show how to adapt Quickjoin to compute k-nearest neighbor joins and sketch a parallel version of the algorithm. The experimental results show that the method works well in practice.