Hierarchical attention-based multimodal fusion for video captioning

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
10151190	1666107	2018	30 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Attention Mechanism - مکانیسم توجه Multi-modal - چند منظوره

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Hierarchical attention-based multimodal fusion for video captioning

چکیده انگلیسی

Attention based encoder-decoder models have shown a great success on video captioning. Recent multi-modal video captioning mainly focused on applying the attention mechanism to all modalities and fusing them in the same level. However, the connections among specific modalities have not been investigated in the fusion process. In this paper, the expressivity of uni-modal is firstly investigated. Due to the characteristic of attention mechanism, an instance-level of visual content is exploited to refine the temporal features. Then, a semantic detection architecture based on CNN+RNN is also employed on the spatiotemporal content to exploit the correlations between semantic labels for better video semantic representation. Finally, a hierarchical attention-based multimodal fusion model for video captioning is proposed by jointly considering the intrinsic properties of multimodal features. Experimental results on the MSVD and MSR-VTT datasets show that the proposed method has achieved competitive performance compared with the related video captioning methods.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 315, 13 November 2018, Pages 362-370

نویسندگان

Chunlei Wu, Yiwei Wei, Xiaoliang Chu, Sun Weichen, Fei Su, Leiquan Wang,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Hierarchical attention-based multimodal fusion for video captioning

دسترسی سریع

ارتباط

English Website