Statistical quantization for similarity search

Article ID	Journal	Published Year	Pages	File Type
525579	Computer Vision and Image Understanding	2014	9 Pages	PDF

Abstract

•Formulate a k-means hashing model based on generalized likelihood ratio analysis.•Introduce statistical analysis into the out-of-sample extension of quantization.•Extend a more generalized observation for the product quantization series.

Approximate nearest neighbor search has attracted much attention recently, which allows for fast query with a predictable sacrifice in search quality. Among the related works, k-means quantizers are possibly the most adaptive methods, and have shown the superiority on search accuracy than the others. However, a common problem shared by the traditional quantizers is that during the out-of-sample extension process, the naive strategy considers only the similarities in Euclidean space without taking into account the statistical and geometrical properties of the data. To cope with this problem, in this paper a novel approach is proposed by formulating a generalized likelihood ratio analysis. In particular, the proposed method takes a physically meaningful discrimination on the affiliations of the new samples with respect to the obtained Voronoi cells. This discrimination essentially imposes the measure of statistical consistency on out-of-sample extension. The experimental studies on two large data sets show that the proposed method is more effective than the benchmark algorithms.

Keywords

Quantization Computer vision Similarity search Hashing Binary code Machine learning