Prediction of partially observed human activity based on pre-trained deep representation

Article ID	Journal	Published Year	Pages	File Type
11002864	Pattern Recognition	2019	32 Pages	PDF

Abstract

Prediction of complex human activities from a partially observed video is valuable in many practical applications but is a challenging problem. When a video is partially observed, maximizing the representational power of the given video is more important than modeling the temporal dynamics of the activity. In this paper, we propose a novel human activity descriptor for prediction, which can maximize the discriminative power of a system in a compact and efficient way using pre-trained deep networks. Specifically, the proposed descriptor can capture the potentially important pairwise relationships between objects without prior knowledge or preset attributes. The relationship information is automatically reflected during the descriptor construction procedure based on object's participation ratios, local and global motion activations. Pre-trained Convolutional Neural Networks are utilized without additional model training procedure. From a practical point of view, the proposed method is more cost-effective when implementing a smart surveillance system. In the experiments, we evaluate the proposed methods in two cases: (1) prediction accuracy with different observation ratios, and (2) the effect of pre-trained network and layer selection. Experimental results from five public datasets verified the efficacy of the proposed method by outperforming competing methods with stable high-performance regardless of network selection.

Keywords

Human interaction