Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
527047 | Image and Vision Computing | 2014 | 13 Pages |
•A motion boundary based sampling strategy is proposed for dense trajectory.•A set of 3D co-occurrence descriptors is developed to describe cuboids.•Two decomposition strategies are presented to further improve performance.•We achieve state-of-the-art results on several human action datasets.
Recent studies witness the success of Bag-of-Features (BoF) frameworks for video based human action recognition. The detection and description of local interest regions are two fundamental problems in BoF framework. In this paper, we propose a motion boundary based sampling strategy and spatial-temporal (3D) co-occurrence descriptors for action video representation and recognition. Our sampling strategy is partly inspired by the recent success of dense trajectory (DT) based features [Wang et al., 2013] for action recognition. Compared with DT, we densely sample spatial-temporal cuboids along a motion boundary which can greatly reduce the number of valid trajectories and preserve the discriminative power. Moreover, we develop a set of 3D co-occurrence descriptors which take account of the spatial-temporal context within local cuboids and deliver rich information for recognition. Furthermore, we decompose each 3D co-occurrence descriptor at pixel level and bin level and integrate the decomposed components with a multi-channel framework, which can improve the performance significantly. To evaluate the proposed methods, we conduct extensive experiments on three benchmarks including KTH, YouTube and HMDB51. The results show that our sampling strategy significantly reduces the computational cost of point tracking without degrading performance. Meanwhile, we achieve superior performance than the state-of-the-art methods. We report 95.6% on KTH, 87.6% on YouTube and 51.8% on HMDB51.