Article ID Journal Published Year Pages File Type
537389 Signal Processing: Image Communication 2015 13 Pages PDF
Abstract

•A 2D/3D key-frame extraction framework driven by aggregated saliency maps.•A method to compute aggregated saliency maps in 3D video using attention models.•Optimal key-frame extraction taking into account different visual saliency regions.

This paper proposes a generic framework for extraction of key-frames from 2D or 3D video sequences, relying on a new method to compute 3D visual saliency. The framework comprises the following novel aspects that distinguish this work from previous ones: (i) the key-frame selection process is driven by an aggregated saliency map, computed from various feature maps, which in turn correspond to different visual attention models; (ii) a method for computing aggregated saliency maps in 3D video is proposed and validated using fixation density maps, obtained from ground-truth eye-tracking data; (iii) 3D video content is processed within the same framework as 2D video, by including a depth feature map into the aggregated saliency. A dynamic programming optimisation algorithm is used to find the best set of K   frames that minimises the dissimilarity error (i.e., maximise similarity) between the original video shots of size N>KN>K and those reconstructed from the key-frames. Using different performance metrics and publicly available databases, the simulation results demonstrate that the proposed framework outperforms similar state-of-art methods and achieves comparable performance as other quite different approaches. Overall, the proposed framework is validated for a wide range of visual content and has the advantage of being independent from any specific visual saliency model or similarity metrics.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,