کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396783 670591 2016 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Singleton indexes for nearest neighbor search
ترجمه فارسی عنوان
شاخص‌های سینگلتون برای جستجوی نزدیکترین همسایه
کلمات کلیدی
جستجوی نزدیکترین همسایه؛ شاخص تنظیم. شاخص متریک؛ انتخاب محوری
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• A description and analysis of the new metric indexing methodology ANNI.
• A series of indexes based on this methodology.
• An extensive experimental comparison with the current state of the art.
• An opensource implementation of all the presented algorithms.

The nearest neighbor search problem is fundamental in computer science, and in spite of the effort of a vast number of research groups, the instances allowing an efficient solution are reduced to databases of objects of small intrinsic dimensions. For intrinsically high-dimensional data, the only possible solution is to compromise and use approximate or probabilistic approaches. For the rest of the instances in the middle, there is an overwhelmingly large number of indexes of claimed good performance. However, the problem of parameter selection makes them unwieldy for use outside of the research community. Even if the indexes can be tuned correctly, either the number of operations for the index construction and tuning is prohibitively large or there are obscure parameters to tune-up. Those restrictions force users from different fields to use brute force to solve the problem in real world instances.In this paper, we present a family of indexing algorithms designed for end users. They require as input, the database, a query sample and the amount of space available. Our building blocks are standard discarding rules, and the indexes will add routing objects such as pivots, hyperplane references or cluster centroids. Those indexes are built incrementally and will self-tune by greedily searching for a global optimum in performance.We experimentally show that using this oblivious strategy our indexes are able to outperform state of the art, manually fine-tuned indexes. For example, our indexes are twice as fast than the fastest alternative (LC, EPT or VPT) for most of our datasets. In the case of LC, the faster alternative for high dimensional datasets, the difference is smaller than 5%. In the same case, our indexes are at least one order of magnitude faster to build. This superior performance is maintained for large, high dimensional datasets (100 million 12-dimensional objects). In this benchmark, our best index is two times faster than the closest alternative (VPT), six times faster than the majority of indexes, and more than sixty times faster than the sequential scan.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 60, August–September 2016, Pages 50–68
نویسندگان
, , ,