Article ID Journal Published Year Pages File Type
6957826 Signal Processing 2018 29 Pages PDF
Abstract
Action recognition in videos, which contains many complex and semantic contents, is still a challenging task in computer vision research. In this paper, we propose a novel attention mechanism that leverages the gate system of Long Short Term Memory (LSTM) to compute the attention weights for action recognition. The proposed attention mechanism is embedded in a recurrent attention network that can explore the spatial-temporal relations between different local regions to concentrate important ones. For more accurate attention, we derive a new attention unit from the standard LSTM unit so as how important the local region is only depends on its input gate. Because of exploring spatial-temporal relations and using attention unit, our model can attend more accurately and thus achieve a better action recognition performance. We evaluate our proposed model on three datasets: UCF101, HMDB51 and Hollywood2, and results illustrate that our model outperforms other attention models with significant improvements.
Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , , ,