Online view-invariant human action recognition using rgb-d spatio-temporal matrix

Article ID	Journal	Published Year	Pages	File Type
531835	Pattern Recognition	2016	12 Pages	PDF

Abstract

•We provide a promising vision-based view-invariant action recognition system from RGB-D videos.•We use the Euclidean distance between spatio-temporal feature vectors that are represented in a Spatio-Temporal Matrix (STM).•We describe the local tendency of the STM using pyramid-structural bag-of-words (BoW-Pyramid) and train a SVM classifier.

We propose a novel approach to recognize action under view changes online with RGB-D camera. Perspective effects and camera motions have been considered as difficult problems in recognizing action that when two video sequences record a specific action from various camera views, the resulting appearances of actions would be entirely different. Consequently, if we simply apply feature extraction methods to the raw video, we will end up getting totally different features. Recent studies explored the stability of self-similarities for action sequence over time, an idea that put into practice in view-invariant action recognition. Instead of doing the extraction of spatio-temporal feature for every frame and using these feature vectors directly, our study uses the Euclidean distance between spatio-temporal feature vectors that are represented in a Spatio-Temporal Matrix (STM). To recognize the action, we describe the local tendency of the STM using pyramid-structural bag-of-words (BoW-Pyramid) and train a SVM as our classifier.

Keywords

View-invariant Action recognition Self-similarity