کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515638 867057 2012 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Large-scale similarity data management with distributed Metric Index
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Large-scale similarity data management with distributed Metric Index
چکیده انگلیسی

Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application.

Research highlights
► New distributed index structure for similarity data management.
► Algorithms for distributed range and k-nearest-neigbors searching.
► Precise and approximate evaluation strategies.
► High scalability confirmed by experiments on 100-million image dataset.
► Stable recall of the approximate search when the dataset grows.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 48, Issue 5, September 2012, Pages 855–872
نویسندگان
, , ,