Region Contextual Visual Words for scene categorization

Article ID	Journal	Published Year	Pages	File Type
385607	Expert Systems with Applications	2011	7 Pages	PDF

Abstract

This paper proposes a method for scene categorization by integrating region contextual information into the popular Bag-of-Visual-Words approach. The Bag-of-Visual-Words approach describes an image as a bag of discrete visual words, where the frequency distributions of these words are used for image categorization. However, the traditional visual words suffer from the problem when faced these patches with similar appearances but distinct semantic concepts. The drawback stems from the independently construction each visual word. This paper introduces Region-Conditional Random Fields model to learn each visual word depending on the rest of the visual words in the same region. Comparison with the traditional Conditional Random Fields model, there are two areas of novelty. First, the initial label of each patch is automatically defined based on its visual feature rather than manually labeling with semantic labels. Furthermore, the novel potential function is built under the region contextual constraint. The experimental results on the three well-known datasets show that Region Contextual Visual Words indeed improves categorization performance compared to traditional visual words.

► Bag-of-Visual-Words representation has recently become popular for scene classification. ► However, learning the visual words in an unsupervised manner suffers from the problem when faced these patches with similar appearance corresponding to distinct semantic concepts. ► This paper proposes Region-Conditional Random Fields model, which aims at integrating region contextual information to address the problem. ► Comparison with the traditional Conditional Random Fields model, there are two areas of novelty: first, the initial label of each patch is automatically defined based on its visual feature rather than manually labeling with semantic labels; furthermore, the novel potential function is built under the region contextual constraint. ► The experimental results on the three well-known datasets show that Region Contextual Visual Words indeed improves categorization performance compared to traditional visual words.

Keywords

Visual word Conditional random fields Scene categorization