Article ID Journal Published Year Pages File Type
530186 Journal of Visual Communication and Image Representation 2010 9 Pages PDF
Abstract

In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided.

Research highlights► A novel compressed-domain method for content adaptation and video summarisation. ► Frame categorization via fuzzy decision is achieved using two inter-frame distance and similarity measurements. ► Activity levels are extracted and measured using low-level features, which are further weighted by detected human objects. ► Activity driven strategy is employed to group frames into a series of content of interest in consistent categories. ► Quantitative evaluations have fully validated the effectiveness and efficiency of our proposed approach, where accurate detection of semantic objects like face can be avoided towards a more flexible and general solution for this topic.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,