Article ID Journal Published Year Pages File Type
527047 Image and Vision Computing 2014 13 Pages PDF
Abstract

•A motion boundary based sampling strategy is proposed for dense trajectory.•A set of 3D co-occurrence descriptors is developed to describe cuboids.•Two decomposition strategies are presented to further improve performance.•We achieve state-of-the-art results on several human action datasets.

Recent studies witness the success of Bag-of-Features (BoF) frameworks for video based human action recognition. The detection and description of local interest regions are two fundamental problems in BoF framework. In this paper, we propose a motion boundary based sampling strategy and spatial-temporal (3D) co-occurrence descriptors for action video representation and recognition. Our sampling strategy is partly inspired by the recent success of dense trajectory (DT) based features [Wang et al., 2013] for action recognition. Compared with DT, we densely sample spatial-temporal cuboids along a motion boundary which can greatly reduce the number of valid trajectories and preserve the discriminative power. Moreover, we develop a set of 3D co-occurrence descriptors which take account of the spatial-temporal context within local cuboids and deliver rich information for recognition. Furthermore, we decompose each 3D co-occurrence descriptor at pixel level and bin level and integrate the decomposed components with a multi-channel framework, which can improve the performance significantly. To evaluate the proposed methods, we conduct extensive experiments on three benchmarks including KTH, YouTube and HMDB51. The results show that our sampling strategy significantly reduces the computational cost of point tracking without degrading performance. Meanwhile, we achieve superior performance than the state-of-the-art methods. We report 95.6% on KTH, 87.6% on YouTube and 51.8% on HMDB51.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,