First Person Action Recognition via Two-stream ConvNet with Long-term Fusion Pooling

Article ID	Journal	Published Year	Pages	File Type
6940131	Pattern Recognition Letters	2018	7 Pages	PDF

Abstract

First person action recognition is an active research area with increasingly popular wearable devices. Action classification for first person video (FPV) is more challenging than conventional action classification due to strong egocentric motions, frequent changes of viewpoints, and diverse global motion patterns. To tackle these challenges, we introduce a two-stream convolutional neural network that improves action recognition via long-term fusion pooling operators. The proposed method effectively captures the temporal structure of actions by leveraging a series of frame-wise features of both appearance and motion in actions. Our experiments validate the effect of the feature pooling operators, and show that the proposed method achieves state-of-the-art performance on standard action datasets.

Keywords

41A05 65D05 41A10 65D17 Feature pooling Action recognition Convolutional neural network