Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6941109 | Pattern Recognition Letters | 2015 | 10 Pages |
Abstract
In this paper, we propose a new multi-layer Fisher vector encoding method based on trajectory descriptors for human action recognition. The proposed method aims at improving the classical shallow Fisher vector (FV) encoding method. Our main contribution resides in considering a progressive representation of the geometric relationships among trajectories. In fact, our presentation is based on three nested layers and provides deep and discriminant structures by local spatial pooling and refining the representation from one layer to the next. To preserve more information in feature encoding process, fine and large spatio-temporal structures have been applied. Fine structures aim at exploiting the local spatio-temporal information by building graphs of trajectories, while large structures aim at exploiting the global spatio-temporal information by spatio-temporal video subdivision. Our approach is evaluated on three popular and large human action datasets: Hollywood2, Olympic sports and HMDB51. Experiments show that more layers produce higher action classification accuracy, which proves the capability of our multi-layer Fisher vector encoding method.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar,