کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
10151190 | 1666107 | 2018 | 30 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Hierarchical attention-based multimodal fusion for video captioning
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
هوش مصنوعی
پیش نمایش صفحه اول مقاله
چکیده انگلیسی
Attention based encoder-decoder models have shown a great success on video captioning. Recent multi-modal video captioning mainly focused on applying the attention mechanism to all modalities and fusing them in the same level. However, the connections among specific modalities have not been investigated in the fusion process. In this paper, the expressivity of uni-modal is firstly investigated. Due to the characteristic of attention mechanism, an instance-level of visual content is exploited to refine the temporal features. Then, a semantic detection architecture based on CNN+RNN is also employed on the spatiotemporal content to exploit the correlations between semantic labels for better video semantic representation. Finally, a hierarchical attention-based multimodal fusion model for video captioning is proposed by jointly considering the intrinsic properties of multimodal features. Experimental results on the MSVD and MSR-VTT datasets show that the proposed method has achieved competitive performance compared with the related video captioning methods.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 315, 13 November 2018, Pages 362-370
Journal: Neurocomputing - Volume 315, 13 November 2018, Pages 362-370
نویسندگان
Chunlei Wu, Yiwei Wei, Xiaoliang Chu, Sun Weichen, Fei Su, Leiquan Wang,