Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6938408 | Journal of Visual Communication and Image Representation | 2018 | 12 Pages |
Abstract
In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also release the EDUB-SegDesc dataset. This is the first dataset for egocentric image sequences description, consisting of 1339 events with 3991 descriptions, from 55â¯days acquired by 11 people. Finally, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Sergi Soler, Petia Radeva,