Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6957826 | Signal Processing | 2018 | 29 Pages |
Abstract
Action recognition in videos, which contains many complex and semantic contents, is still a challenging task in computer vision research. In this paper, we propose a novel attention mechanism that leverages the gate system of Long Short Term Memory (LSTM) to compute the attention weights for action recognition. The proposed attention mechanism is embedded in a recurrent attention network that can explore the spatial-temporal relations between different local regions to concentrate important ones. For more accurate attention, we derive a new attention unit from the standard LSTM unit so as how important the local region is only depends on its input gate. Because of exploring spatial-temporal relations and using attention unit, our model can attend more accurately and thus achieve a better action recognition performance. We evaluate our proposed model on three datasets: UCF101, HMDB51 and Hollywood2, and results illustrate that our model outperforms other attention models with significant improvements.
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Mingxing Zhang, Yang Yang, Yanli Ji, Ning Xie, Fumin Shen,