کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
534367 870247 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Weighted multi-view key-frame extraction
ترجمه فارسی عنوان
استخراج قاب کلیدی با چند ضلعی با وزن
کلمات کلیدی
استخراج قاب کلیدی، خوشه طیفی، خوشه بندی چندگانه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• We propose an efficient method for key-frame extraction where different image descriptors (views) are combined to capture different aspects of video frames.
• A weighted multi-view clustering algorithm based on Convex Mixture Models is employed to automatically assign weight to each descriptor.
• A similarity matrix is built using these weights and is used as input to a spectral clustering algorithm to provide the final partitioning of the frames into groups.
• To the best of our knowledge, this is the first key-frame extraction method that is capable of combining several image descriptors and estimating the importance of each descriptor.

The extraction of representative key-frames from video shots is very important in video processing and analysis, since it constitutes the basis for several important tasks such as video shot summarization, browsing and retrieval as well as high-level video segmentation. The extracted key-frames should capture a great percentage of the information of a shot content, while at the same time they should not present similar visual information. Clustering or segmentation methods are usually employed to extract key-frames. A major difficulty is caused by the large variety in the visual content of videos. Thus, using a single image descriptor (color, texture etc) to extract key-frames is not always effective, since there is no single descriptor surpassing the others in all video cases. To tackle this problem, we propose an approach for the weighted fusion of several descriptors that automatically estimates the weight of each descriptor. The weights reflect the relevance of each descriptor for the specific video shot. Moreover, they are used to form a composite similarity matrix as the weighted sum of all the similarity matrices corresponding to the individual descriptors. This matrix is then used as input to a spectral clustering algorithm that partitions shot frames into groups. Finally the medoid frame of each group is selected as key-frame. Numerical experiments using a variety of videos demonstrate that our method is capable of efficiently summarizing video shots regardless of the characteristics of the visual content of a video.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 72, 1 March 2016, Pages 52–61
نویسندگان
, , ,