Metric Index: An efficient and scalable solution for precise and approximate similarity search

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
397270	671023	2011	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Metric Index: An efficient and scalable solution for precise and approximate similarity search

چکیده انگلیسی

Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in established structures such as the B+-tree or even in a distributed storage. We implemented the M-Index with the B+-tree and performed experiments on two datasets—the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-Index variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-Index demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient—maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search.

Research Highlights
► M-Index employs all principles of metric space partitioning, pruning and filtering.
► Data is actually stored in well-established structures such as B+-tree.
► Experiments performed on complex real-life data with several million objects.
► M-Index keeps similar data close, which makes approximate algorithm very efficient.
► Almost constant scalability for approximate search.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 36, Issue 4, June 2011, Pages 721–733

نویسندگان

David Novak, Michal Batko, Pavel Zezula,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Metric Index: An efficient and scalable solution for precise and approximate similarity search

دسترسی سریع

ارتباط

English Website