Article ID Journal Published Year Pages File Type
6856523 Information Sciences 2018 16 Pages PDF
Abstract
Despite impressive achievements in image processing and artificial intelligence in the past decade, understanding video-based action remains a challenge. However, the intensive development of 3D computer vision in recent years has brought more potential research opportunities in pose-based action detection and recognition. Thanks to the advantages of depth camera devices like the Microsoft Kinect sensor, we developed an effective approach to in-depth analysis of indoor actions using skeleton information, in which skeleton-based feature extraction and topic model-based learning are two major contributions. Geometric features, i.e. joint distance, joint angle, and joint-plane distance are calculated in the spatio-temporal dimension. These features are merged into two types, called pose and transition features, and then are provided to codebook construction to convert sparse features into visual words by k-means clustering. An efficient hierarchical model is developed to describe the full correlation of feature - poselet - action based on Pachinko Allocation Model. This model has the potential to uncover more hidden poselets, which have been recognized as the valuable information and help to differentiate pose-sharing actions. The experimental results on several well-known datasets, such as MSR Action 3D, MSR Daily Activity 3D, Florence 3D Action, UTKinect-Action 3D, and NTU RGB+D Action Recognition, demonstrate the high recognition accuracy of the proposed method. Our method outperforms state-of-the-art methods in the field in most dataset benchmarks.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , , , , , , , , ,