Human action recognition based on multi-layer Fisher vector encoding method

Article ID	Journal	Published Year	Pages	File Type
6941109	Pattern Recognition Letters	2015	10 Pages	PDF

Abstract

In this paper, we propose a new multi-layer Fisher vector encoding method based on trajectory descriptors for human action recognition. The proposed method aims at improving the classical shallow Fisher vector (FV) encoding method. Our main contribution resides in considering a progressive representation of the geometric relationships among trajectories. In fact, our presentation is based on three nested layers and provides deep and discriminant structures by local spatial pooling and refining the representation from one layer to the next. To preserve more information in feature encoding process, fine and large spatio-temporal structures have been applied. Fine structures aim at exploiting the local spatio-temporal information by building graphs of trajectories, while large structures aim at exploiting the global spatio-temporal information by spatio-temporal video subdivision. Our approach is evaluated on three popular and large human action datasets: Hollywood2, Olympic sports and HMDB51. Experiments show that more layers produce higher action classification accuracy, which proves the capability of our multi-layer Fisher vector encoding method.

Keywords

Human action recognition