Article ID Journal Published Year Pages File Type
407293 Neurocomputing 2016 9 Pages PDF
Abstract

Recently, as an efficient representation of realistic videos, improved trajectory features (ITF) combined with Fisher vector (FV) encoding achieved state-of-the-art results on four challenging datasets concerning action recognition. However, directly integrating it with simple spatio-temporal pyramid (STP) will result in performance degradation. Therefore, in this paper, a novel version of cluster trees model is proposed to improve recognition performance by taking into account spatio-temporal relationships between local trajectory features. We modified and improved cluster trees model to reduce noisy clusters and alleviate intra-class variation. A further advantage of the proposed method is significantly reducing memory storage and computation time by conduct dimensionality reduction on Fisher vectors. Finally, an adaptive kernel is proposed to efficiently compare the variable-size tree representations of two videos for action recognition, which mitigates the risk introduced by noisy cluster tree nodes. Experimental results on four challenging action datasets (i.e., Olympic Sports, Hollywood2, HMDB51 and UCF50) demonstrate the effectiveness and robustness of the proposed approach which outperforms the current state-of-the-art.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,