Article ID Journal Published Year Pages File Type
412054 Neurocomputing 2015 10 Pages PDF
Abstract

●We prove that the variance of the p-th dimension is equivalent to E((xip–xjp)2) and a good way to measure dispersion under the assumption that data are independent and identically distributed (i.i.d.). xip is the p-th dimension of the i-th data and xjp is the p-th dimension of the j-th data.●An effective and efficient hashing method which we call adaptive bit allocation hashing (ABAH) is proposed.●Based on the naive ABAH scheme, we propose an improved version of ABAH.●Our method can achieve good results even when the data are anisotropic.

Using hashing algorithms to learn binary codes representation of data for fast approximate nearest neighbor (ANN) search has attracted more and more attention. Most existing hashing methods employ various hash functions to encode data. The resulting binary codes can be obtained by concatenating bits produced by those hash functions. These methods usually have two main steps: projection and thresholding. One problem with these methods is that every dimension of the projected data is regarded as of same importance and encoded by one bit, which may result in ineffective codes. In this paper, we introduce an adaptive bit allocation hashing (ABAH) method to encode data for ANN search. The basic idea is, according to the dispersions of all the dimensions after projection we use different numbers of bits to encode them. In our method, more bits will be adaptively allocated to encode dimensions with larger dispersion while fewer bits for dimensions with smaller dispersion. This novel bit allocation scheme makes our hashing method effectively preserve the neighborhood structure in the original data space. Extensive experiments show that the proposed ABAH significantly outperforms other state-of-the-art methods for ANN search task.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,