کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6937446 | 1449736 | 2018 | 29 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
VideoLSTM convolves, attends and flows for action recognition
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
We present VideoLSTM for end-to-end sequence learning of actions in video. Rather than adapting the video to the peculiarities of established recurrent or convolutional architectures, we adapt the architecture to fit the requirements of the video medium. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, video has a spatial layout. To exploit the spatial correlation we hardwire convolutions in the soft-Attention LSTM architecture. Second, motion not only informs us about the action content, but also guides better the attention towards the relevant spatio-temporal locations. We introduce motion-based attention. And finally, we demonstrate how the attention from VideoLSTM can be exploited for action localization by relying on the action class label and temporal attention smoothing. Experiments on UCF101, HMDB51 and THUMOS13 reveal the benefit of the video-specific adaptations of VideoLSTM in isolation as well as when integrated in a combined architecture. It compares favorably against other LSTM architectures for action classification and especially action localization.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Vision and Image Understanding - Volume 166, January 2018, Pages 41-50
Journal: Computer Vision and Image Understanding - Volume 166, January 2018, Pages 41-50
نویسندگان
Zhenyang Li, Kirill Gavrilyuk, Efstratios Gavves, Mihir Jain, Cees G.M. Snoek,