کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
430579 688051 2012 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Pivot selection: Dimension reduction for distance-based indexing
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Pivot selection: Dimension reduction for distance-based indexing
چکیده انگلیسی

Distance-based indexing exploits only the triangle inequality to answer similarity queries in metric spaces. Lacking coordinate structure, mathematical tools in RnRn can only be applied indirectly, making it difficult to theoretically study metric-space indexing. Toward solving this problem, a common algorithmic step is to select a small number of special points, called pivots  , and map the data objects to a low-dimensional space, one dimension for each pivot, where each dimension represents the distances of a pivot to the data objects. We formalize a “pivot space model” where all the data objects are used as pivots such that data is mapped from metric space to RnRn, preserving all the pairwise distances under L∞L∞. With this model, it can be shown that the indexing problem in metric space can be equivalently studied in RnRn. Further, we show the necessity of dimension reduction for RnRn and that the only effective form of dimension reduction is to select existing dimensions, i.e. pivot selection. The coordinate structure of RnRn makes the application of many mathematical tools possible. In particular, Principle Component Analysis (PCA) is incorporated into a heuristic method for pivot selection and shown to be effective over a large range of workloads. We also show that PCA can be used to reliably measure the intrinsic dimension of a metric space.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Discrete Algorithms - Volume 13, May 2012, Pages 32–46
نویسندگان
, , ,