Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
535559 | Pattern Recognition Letters | 2013 | 9 Pages |
Temporal segmentation of successive actions in a long-term video sequence has been a long-standing problem in computer vision. In this paper, we exploit a novel learning-based framework. Given a video sequence, only a few characteristic frames are selected by the proposed selection algorithm, and then the likelihood to trained models is calculated in a pair-wise way, and finally segmentation is obtained as the optimal model sequence to realize the maximum likelihood. The average accuracy on IXMAS dataset reached to 80.5% at frame level, using only 16.5% of all frames in computation time of 1.57 s per video which has 1160 frames on the average.
Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (87 K)Download as PowerPoint slideHighlights► Characteristic frames are selected in a video instead of the entire sequence. ► Pairwise-frame representation is employed for actions modeling/segmenting. ► Computation time is decreased since we use a smaller number of frames instead. ► Similar poses appearing in different actions are identified correctly.