کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
535551 870353 2013 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Spatio-temporal layout of human actions for improved bag-of-words action detection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Spatio-temporal layout of human actions for improved bag-of-words action detection
چکیده انگلیسی

We investigate how human action recognition can be improved by considering spatio-temporal layout of actions. From literature, we adopt a pipeline consisting of STIP features, a random forest to quantize the features into histograms, and an SVM classifier. Our goal is to detect 48 human actions, ranging from simple actions such as walk to complex actions such as exchange. Our contribution to improve the performance of this pipeline by exploiting a novel spatio-temporal layout of the 48 actions. Here each STIP feature does not in the video contributes to the histogram bins by a unity value, but rather by a weight given by its spatio-temporal probability. We propose 6 configurations of spatio-temporal layout, where the varied parameters are the coordinate system and the modeling of the action and its context. Our model of layout does not change any other parameter of the pipeline, it requires no re-learning of the random forest, yields a limited increase of the size of its resulting representation by only a factor two, and at a minimal additional computational cost of only a handful of operations per feature. Extensive experiments show that the layout is demonstrated to be distinctive of actions that involve trajectories, (dis)appearance, kinematics, and interactions. The visualization of each action’s layout illustrates that our approach is indeed able to model spatio-temporal patterns of each action. Each layout is experimentally shown to be optimal for a specific set of actions. Generally, the context has more effect than the choice of coordinate system. The most impressive improvements are achieved for complex actions involving items. For 43 out of 48 human actions, the performance is better or equal when spatio-temporal layout is included. In addition, we show our method outperforms state-of-the-art for the IXMAS and UT-Interaction datasets.

Figure optionsDownload high-quality image (141 K)Download as PowerPoint slideHighlights
► Using the spatio-temporal layout of human actions improves discrimination.
► The method is a weighting scheme on top of the popular bag-of-words model.
► No need to re-learn the action codebook.
► Tested on 1294 test videos, under varying recording conditions, with 195 variations per action: 19.2% improvement.
► Outperforms state-of-the-art on IXMAS and UT-Interaction datasets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 34, Issue 15, 1 November 2013, Pages 1861–1869
نویسندگان
, ,