Article ID Journal Published Year Pages File Type
6937648 Computer Vision and Image Understanding 2016 12 Pages PDF
Abstract
Learning human activity models from streaming videos should be a continuous process as new activities arrive over time. However, recent approaches for human activity recognition are usually batch methods, which assume that all the training instances are labeled and present in advance. Among such methods, the exploitation of the inter-relationship between the various objects in the scene (termed as context) has proved extremely promising. Many state-of-the-art approaches learn human activity models continuously but do not exploit the contextual information. In this paper, we propose a novel framework that continuously learns both of the appearance and the context models of complex human activities from streaming videos. We automatically construct a conditional random field (CRF) graphical model to encode the mutual contextual information among the activities and the related object attributes. In order to reduce the amount of manual labeling of the incoming instances, we exploit active learning to select the most informative training instances with respect to both of the appearance and the context models to incrementally update these models. Rigorous experiments on four challenging datasets demonstrate that our framework outperforms state-of-the-art approaches with significantly less amount of manually labeled data.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,