Egocentric video description based on temporally-linked sequences

Article ID	Journal	Published Year	Pages	File Type
6938408	Journal of Visual Communication and Image Representation	2018	12 Pages	PDF

Abstract

In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also release the EDUB-SegDesc dataset. This is the first dataset for egocentric image sequences description, consisting of 1339 events with 3991 descriptions, from 55â¯days acquired by 11 people. Finally, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description.

Keywords

Egocentric vision Deep learning