Article ID Journal Published Year Pages File Type
6941109 Pattern Recognition Letters 2015 10 Pages PDF
Abstract
In this paper, we propose a new multi-layer Fisher vector encoding method based on trajectory descriptors for human action recognition. The proposed method aims at improving the classical shallow Fisher vector (FV) encoding method. Our main contribution resides in considering a progressive representation of the geometric relationships among trajectories. In fact, our presentation is based on three nested layers and provides deep and discriminant structures by local spatial pooling and refining the representation from one layer to the next. To preserve more information in feature encoding process, fine and large spatio-temporal structures have been applied. Fine structures aim at exploiting the local spatio-temporal information by building graphs of trajectories, while large structures aim at exploiting the global spatio-temporal information by spatio-temporal video subdivision. Our approach is evaluated on three popular and large human action datasets: Hollywood2, Olympic sports and HMDB51. Experiments show that more layers produce higher action classification accuracy, which proves the capability of our multi-layer Fisher vector encoding method.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,