Article ID Journal Published Year Pages File Type
6941621 Signal Processing: Image Communication 2018 8 Pages PDF
Abstract
We propose a repeated review deep learning model for image captioning in image evidence review process. It consists of two subnetworks. One is the convolutional neural network which is employed to extract the image features and the other is the recurrent neural network which is used to decode the image features into captions. Our model combines the advantages of the two subnetworks by recalling visual information different from the traditional model of encoder-decoder, and then introduces multimodal layer to fuse the image and caption effectively. The proposed model has been validated on benchmark datasets (MSCOCO, Flickr). It shows that the proposed model performs well on bleu-3 and bleu-4, even to some extent, beyond the best models available today (such as NIC, m-RNN, etc.).
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,