Feature fusion within local region using localized maximum-margin learning for scene categorization

Article ID	Journal	Published Year	Pages	File Type
532412	Pattern Recognition	2012	13 Pages	PDF

Abstract

In the field of visual recognition such as scene categorization, representing an image based on the local feature (e.g., the bag-of-visual-word (BOVW) model and the bag-of-contextual-visual-word (BOCVW) model) has become popular and one of the most successful methods. In this paper, we propose a method that uses localized maximum-margin learning to fuse different types of features during the BOCVW modeling for eventual scene classification. The proposed method fuses multiple features at the stage when the best contextual visual word is selected to represent a local region (hard assignment) or the probabilities of the candidate contextual visual words used to represent the unknown region are estimated (soft assignment). The merits of the proposed method are that (1) errors caused by the ambiguity of single feature when assigning local regions to the contextual visual words can be corrected or the probabilities of the candidate contextual visual words used to represent the region can be estimated more accurately; and that (2) it offers a more flexible way in fusing these features through determining the similarity-metric locally by localized maximum-margin learning. The proposed method has been evaluated experimentally and the results indicate its effectiveness.

► Different features are fused at local region using maximum-margin learning. ► Errors at selecting the visual words using single feature can be corrected. ► Offers a more flexible way in fusing multiple features at local region. ► Superior to single feature based method. ► Superior to (or equivalent to) global feature fusion method.

Keywords

Image recognition Scene categorization Feature fusion