کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431357 688513 2009 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Solving similarity joins and range queries in metric spaces with the list of twin clusters
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Solving similarity joins and range queries in metric spaces with the list of twin clusters
چکیده انگلیسی

The metric space model abstracts many proximity or similarity problems, where the most frequently considered primitives are range and k-nearest neighbor search, leaving out the similarity join, an extremely important primitive. In fact, despite the great attention that this primitive has received in traditional and even multidimensional databases, little has been done for general metric databases.We solve two variants of the similarity join problem: (1) range joins: Given two sets of objects and a distance threshold r, find all the object pairs (one from each set) at distance at most r; and (2) k-closest pair joins: Find the k closest object pairs (one from each set). For this sake, we devise a new metric index, coined List of Twin Clusters (LTC), which indexes both sets jointly, instead of the natural approach of indexing one or both sets independently. Finally, we show how to use the LTC in order to solve classical range queries. Our results show significant speedups over the basic quadratic-time naive alternative for both join variants, and that the LTC is competitive with the original list of clusters when solving range queries. Furthermore, we show that our technique has a great potential for improvements.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Discrete Algorithms - Volume 7, Issue 1, March 2009, Pages 18–35
نویسندگان
, ,