Combining unsupervised learning and discrimination for 3D action recognition

Article ID	Journal	Published Year	Pages	File Type
6959323	Signal Processing	2015	15 Pages	PDF

Abstract

Previous work on 3D action recognition has focused on using hand-designed features, either from depth videos or 2D videos. In this work, we present an effective way to combine unsupervised feature learning with discriminative feature mining. Unsupervised feature learning allows us to extract spatio-temporal features from unlabeled video data. With this, we can avoid the cumbersome process of designing feature extraction by hand. We propose an ensemble approach using a discriminative learning algorithm, where each base learner is a discriminative multi-kernel-learning classifier, trained to learn an optimal combination of joint-based features. Our evaluation includes a comparison to state-of-the-art methods on the MSRAction 3D dataset, where our method, abbreviated EnMkl, outperforms earlier methods. Furthermore, we analyze the efficiency of our approach in a 3D action recognition system.

Keywords

Human action recognition Depth camera Unsupervised learning Multi-kernel learning Ensemble learning