Spatial and temporal scoring for egocentric video summarization

Article ID	Journal	Published Year	Pages	File Type
4948627	Neurocomputing	2016	22 Pages	PDF

Abstract

We present a summarization approach for egocentric video. Given hours of video, the proposed method produces a compact storyboard summary of the camera wearer's day. In contrast to traditional keyframe selection techniques, the resulting summary focuses on the most important video shots which reflect high stable salience, discrimination and representativeness. To accomplish this, we utilize egocentric salience cues, motion cues and a selection model to capture stable salience weight, discriminative weight and representative weight of a video shot respectively. We further combine these weights in a unified framework to predict the importance score of a shot, based on which, important shots are selected for the storyboard. Critically, the approach is neither camera-wearer-specific nor object-specific; that means the learned importance metric need not be trained for a given user or context, and it can predict the importance of shots that have never been seen previously. Experimental results on three video datasets across various genres demonstrate that our proposed approach clearly outperforms several state-of-the-art methods.

Keywords

Video summarization Sparse coding