کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
526788 | 869229 | 2012 | 14 صفحه PDF | دانلود رایگان |

Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.
Figure optionsDownload high-quality image (227 K)Download as PowerPoint slideHighlights
► We integrate global configuration of local features into action analysis.
► Local features are selected using Sparse Hierarchical Bayes Filter.
► Probabilistic action localization with Dynamic Conditional Random Field.
► Max-margin action localization with Structural Support Vector Machine.
► Structured framework is more effective and robust than order-less methods.
Journal: Image and Vision Computing - Volume 30, Issue 1, January 2012, Pages 1–14