Multimodal learning for facial expression recognition

Article ID	Journal	Published Year	Pages	File Type
533253	Pattern Recognition	2015	12 Pages	PDF

Abstract

•Multimodal learning for facial expression recognition (FER) is proposed.•The first attempt to do FER from the joint representation of texture and landmarks.•The multimodal structure combines feature extraction and classification together.•Structured regularization is used to enforce the sparsity of different modalities.

In this paper, multimodal learning for facial expression recognition (FER) is proposed. The multimodal learning method makes the first attempt to learn the joint representation by considering the texture and landmark modality of facial images, which are complementary with each other. In order to learn the representation of each modality and the correlation and interaction between different modalities, the structured regularization (SR) is employed to enforce and learn the modality-specific sparsity and density of each modality, respectively. By introducing SR, the comprehensiveness of the facial expression is fully taken into consideration, which can not only handle the subtle expression but also perform robustly to different input of facial images. With the proposed multimodal learning network, the joint representation learning from multimodal inputs will be more suitable for FER. Experimental results on the CK+ and NVIE databases demonstrate the superiority of our proposed method.

Keywords

Multimodal learning Texture Landmark Facial expression recognition