Article ID Journal Published Year Pages File Type
6863960 Neurocomputing 2018 44 Pages PDF
Abstract
In this paper, an eleven-layered Convolutional Neural Network with Visual Attention is proposed for facial expression recognition. The network is composed of three components. First, local convolutional features of faces are extracted by a stack of ten convolutional layers. Second, the regions of interest are automatically determined according to these local features by the embedded attention model. Third, the local features in these regions are aggregated and used to infer the emotional label. These three components are integrated into a single network which can be trained in an end-to-end scheme. Extensive experiments on four kinds of data (namely aligned frontal faces, faces in different poses, aligned unconstrained faces, and grouped unconstrained faces) prove that the proposed method can improve the accuracy and obtain good visualization. The visualization shows that the learned regions of interest are partly consistent with the locations of emotion specific Action Units. This founding confirms the interpretation of Facial Action Coding System and Emotional Facial Action Coding System from a machine learning perspective.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,