کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
397270 671023 2011 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Metric Index: An efficient and scalable solution for precise and approximate similarity search
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Metric Index: An efficient and scalable solution for precise and approximate similarity search
چکیده انگلیسی

Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in established structures such as the B+-tree or even in a distributed storage. We implemented the M-Index with the B+-tree and performed experiments on two datasets—the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-Index variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-Index demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient—maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search.

Research Highlights
► M-Index employs all principles of metric space partitioning, pruning and filtering.
► Data is actually stored in well-established structures such as B+-tree.
► Experiments performed on complex real-life data with several million objects.
► M-Index keeps similar data close, which makes approximate algorithm very efficient.
► Almost constant scalability for approximate search.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 36, Issue 4, June 2011, Pages 721–733
نویسندگان
, , ,