Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
392022 | Information Sciences | 2015 | 11 Pages |
Scene understanding, as a high level cognition function of human beings, has long been a difficult task to replicate in computer vision and machine learning. Much research has attempted to address the issue, but it remains a great challenge: the process of scene understanding includes various aspects in both top-down and bottom-up processing in the human cognitive system. The deficiencies of 2D data have been proven, especially with regard to indoor scenes, because of their complexities and researchers are now focusing on 3D data. In this paper, we present an approach to indoor scene understanding based on monocular RGB-D images. We explore significant features and structure information from depth images, assess their application to indoor scenes, and integrate them into the whole process of indoor scene understanding with conventional 2D information. Additionally, we conceptualize indoor scene understanding as a global optimization framework comprising: segmentation, support inference, multi-object recognition and scene classification. Experiments demonstrate that our approach significantly outperforms state-of-the-art algorithms in all these sub-tasks.