Quicker range- and k-NN joins in metric spaces

Article ID	Journal	Published Year	Pages	File Type
396679	Information Systems	2015	16 Pages	PDF

Abstract

•We substantially improve the space and time complexity of the Quickjoin algorithm.•We show how to select good pivots for Quickjoin.•We generalize the Quickjoin algorithm to handle k-nearest neighbor joins.

We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe UU, we want to compute the set A⋈B={(a,b)∈A×B|d(a,b)≤r} efficiently, where d:U×U→R+d:U×U→R+ is a metric distance function and r∈R+r∈R+ is user supplied query radius. In particular we are interested in the case where we have no index available (nor we can afford to build it) for either A or B. In this paper we improve the Quickjoin algorithm [6], inspired by the well-known Quicksort algorithm, by (i) replacing the low level component that handles small subsets with essentially brute-force nested loop with a more efficient method; (ii) showing that, contrary to Quicksort, in Quickjoin unbalanced partitioning can improve the algorithm; (iii) making the algorithm probabilistic while still obtaining most of the relevant results; and (iv) improving the (worst case) work space needed, from O(n2)O(n2) to O(logn), where n is the size of the dataset(s). We also discuss pivot selection, show how to adapt Quickjoin to compute k-nearest neighbor joins and sketch a parallel version of the algorithm. The experimental results show that the method works well in practice.

Keywords

k-Nearest Neighbors Metric spaces Range searching