Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4969014 | Image and Vision Computing | 2016 | 11 Pages |
Abstract
Human action recognition from still image has recently drawn increasing attention in human behavior analysis and also poses great challenges due to the huge inter ambiguity and intra variability. Vector of locally aggregated descriptors (VLAD) has achieved state-of-the-art performance in many image classification tasks based on local features. The great success of VLAD is largely due to its high descriptive ability and computational efficiency. In this paper, towards optimal VLAD representations for human action recognition from still images, we improve VLAD by tackling three important issues including empty cavity, ambiguity and pooling strategies. The empty cavity limits the performance of VLAD and has long been overlooked. We investigate the empty cavity and provide an effective solution to deal with it, which improves the performance of VLAD; we enhance the codewords with middle level of assignments which are more reliable and can provide more useful information for realistic activity; we propose incorporating the generalized max pooling to replace sum pooling in VLAD, which is more reliable for the final representation. We have conducted extensive experiments on four widely-used benchmarks to validate the proposed method for human action recognition from still images. Our method produces competitive performance with state-of-the-art algorithms.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Zhang Lei, Li Changxi, Peng Peipei, Xiang Xuezhi, Song Jingkuan,