Article ID Journal Published Year Pages File Type
396917 Information Systems 2013 12 Pages PDF
Abstract

An efficient and universal similarity search solution is a holy grail for multimedia information retrieval. Most similarity indexes work by mapping the original multimedia objects into simpler representations, which are then searched by proximity using a suitable distance function.In this paper we introduce a novel representation of the objects as sets of integers, with a distance that is computed using set operations. This allows us to use compressed inverted indexes, which have become a mature technology that excels in this type of search. Such indexes allow processing queries in main memory even for very large databases, so that the disk is accessed only to verify a few candidates and present the results.We show that our technique achieves very good speed/compression/recall tradeoffs. As an example, to reach 92% recall in 30-nearest neighbor searches, an index using 1 to 2.5 bytes per object inspects less than 0.6% of the actual objects. Furthermore, the ratio between the distances to the actual nearest neighbors and those reported by our algorithm is very close to 1.

► We present a compressed index for general metric/similarity spaces. ► Our index is very fast even for high intrinsic and large datasets. ► Our index requires to review a very small portion of the database. ► It keeps an excellent tradeoff between memory, search time, recall, and proximity ratio. ► We present experimental results supporting our claims for both synthetic and real world databases.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,