Article ID Journal Published Year Pages File Type
11002864 Pattern Recognition 2019 32 Pages PDF
Abstract
Prediction of complex human activities from a partially observed video is valuable in many practical applications but is a challenging problem. When a video is partially observed, maximizing the representational power of the given video is more important than modeling the temporal dynamics of the activity. In this paper, we propose a novel human activity descriptor for prediction, which can maximize the discriminative power of a system in a compact and efficient way using pre-trained deep networks. Specifically, the proposed descriptor can capture the potentially important pairwise relationships between objects without prior knowledge or preset attributes. The relationship information is automatically reflected during the descriptor construction procedure based on object's participation ratios, local and global motion activations. Pre-trained Convolutional Neural Networks are utilized without additional model training procedure. From a practical point of view, the proposed method is more cost-effective when implementing a smart surveillance system. In the experiments, we evaluate the proposed methods in two cases: (1) prediction accuracy with different observation ratios, and (2) the effect of pre-trained network and layer selection. Experimental results from five public datasets verified the efficacy of the proposed method by outperforming competing methods with stable high-performance regardless of network selection.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,