| Article ID | Journal | Published Year | Pages | File Type |
|---|---|---|---|---|
| 526904 | Image and Vision Computing | 2013 | 11 Pages |
Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we combine both local motion and appearance feature in a novel framework to model the temporal dynamics of face and body gesture. The proposed framework employs MHI-HOG and Image-HOG features through temporal normalization or bag of words to capture motion and appearance information. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction and speed of a region of interest as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding region of interest. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. To implicitly model local temporal dynamics of an expression, we further propose a bag of words (BOW) based representation for both MHI-HOG and Image-HOG features. Experimental results demonstrate promising performance as compared with the state-of-the-art. Significant improvement of recognition accuracy is achieved as compared with the frame-based approach that does not consider the underlying temporal dynamics.
Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (323 K)Download as PowerPoint slideHighlights► We develop MHI-HOG and Image-HOG to capture motion and appearance information in real time. ► We propose a new algorithm to segment expression cycles based on Motion Area and Neutral Divergence. ► We propose two affect recognition approaches: temporal normalization and bag of word. ► We recognize both face and body gesture modalities from a single sensorial channel.
