کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396867 670610 2014 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Approximate nearest neighbor algorithm based on navigable small world graphs
ترجمه فارسی عنوان
الگوریتم نزدیکترین همسایگی را براساس نمودار های کوچک جهان قابل حمل نزدیک کنید
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• Novel approximate k-nearest neighbor problem structure is presented.
• The structure is based on a navigable small world graph.
• The structure has weak dimensionality dependence.
• The structure has shown superior performance in simulations.

We propose a novel approach to solving the approximate k-nearest neighbor search problem in metric spaces. The search structure is based on a navigable small world graph with vertices corresponding to the stored elements, edges to links between them, and a variation of greedy algorithm for searching. The navigable small world is created simply by keeping old Delaunay graph approximation links produced at the start of construction. The approach is very universal, defined in terms of arbitrary metric spaces and at the same time it is very simple. The algorithm handles insertions in the same way as queries: by finding approximate neighbors for the inserted element and connecting it to them. Both search and insertion can be done in parallel requiring only local information from the structure. The structure can be made distributed. The accuracy of the probabilistic k-nearest neighbor queries can be adjusted without rebuilding the structure.The performed simulation for data in the Euclidean spaces shows that the structure built using the proposed algorithm has small world navigation properties with log2(n)log2(n) insertion and search complexity at fixed accuracy, and performs well at high dimensionality. Simulation on a CoPHiR dataset revealed its high efficiency in case of large datasets (more than an order of magnitude less metric computations at fixed recall) compared to permutation indexes. Only 0.03% of the 10 million 208-dimensional vector dataset is needed to be evaluated to achieve 0.999 recall (virtually exact search). For recall 0.93 processing speed 2800 queries/s can be achieved on a dual Intel X5675 Xenon server node with Java implementation.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 45, September 2014, Pages 61–68
نویسندگان
, , , ,