Article ID Journal Published Year Pages File Type
6937493 Computer Vision and Image Understanding 2017 12 Pages PDF
Abstract
We evaluate our approach on four distinct activity datasets, namely Hollywood Extended, MPII Cooking, Breakfast and CRIM13. It shows that the proposed system is able to align the scripted actions with the video data, that the learned models localize and classify actions in the datasets, and that they outperform any current state-of-the-art approach for aligning transcripts with video data.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,