Weakly-supervised scene parsing with multiple contextual cues

Article ID	Journal	Published Year	Pages	File Type
391983	Information Sciences	2015	14 Pages	PDF

Abstract

Scene parsing, fully labeling an image with each region corresponding to a label, is one of the core problems of computer vision. Previous methods to this problem usually rely on patch-level models trained from well labeled data. In this paper, we propose a weakly-supervised scene parsing algorithm that semantically parses a collection of images with multi-label, which is guided by the top-down category models and bottom-up local patch contexts across images that closely related segments usually have similar labels. Images are segmented to patches on multi-level and the contextual relations of patches are discovered via sparse representation by ℓ1 minimization, based on which a graph is constructed. The multi-level spatial context of patches is also embedded in the graph, based on which image-level labels can be propagated to segments optimally. The contextual patch labeling process is formulated in an optimization framework and solved by a convergent iterative method. The category models are learned from the decomposed label representations of the image set and applied to the segments. Final labeling is obtained by combining all the information on pixel level. The effectiveness of the proposed method is demonstrated in experiments on two benchmark datasets and comparisons are taken.

Keywords

Scene parsing