Article ID Journal Published Year Pages File Type
532452 Journal of Visual Communication and Image Representation 2015 10 Pages PDF
Abstract

•We question/analyse the basic assumptions made for the visual words approach.•Experimental support for the following three statements.•There are more visually distinct patterns than can be listed in a codebook.•One element of a codebook represents a set of many, visually distinct patterns.•There are no single, selective SIFT descriptors to serve as codebook elements.

Codebooks are a widely accepted technique to recognise objects by sets of local features. The method has been applied to many classes of objects, even very abstract ones. But although state of the art recognition rates have been reported, the method is still far away from being reliable in any sense that is related to human vision. The literature on this topic emphasises detailed descriptions of statistical estimators over a basic analysis of the data. A deeper understanding of the data is however needed to achieve a further development of the field. In this paper, we therefore present a set of quantitative experiments on codebooks of the popular SIFT descriptors. The results discourage the use of illustrative but overly simplifying descriptions of the visual words approach. It is in particular demonstrated that (1) there are more visually distinct patterns than can be listed in a codebook, (2) one element of a codebook represents a set of many, visually distinct patterns, and (3) there are no single, selective SIFT descriptors to serve as codebook elements. This makes us wonder why the method works after all. We discuss several options.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,