Article ID Journal Published Year Pages File Type
558300 Computer Speech & Language 2014 17 Pages PDF
Abstract

•We propose neutral reference models to detect emotions in the fundamental frequency.•A novel scheme based on functional data analysis is presented.•The proposed approach achieves higher accuracies than the state-of-the-art method.•The scheme is also employed to the to detect the most emotionally salient segments.•The approach is validated at the sub-sentence level by employing a natural database.

This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours. The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto that basis. For a given F0 contour, we estimate the functional principal component analysis (PCA) projections, which are used as features for emotion detection. The approach is evaluated with lexicon-dependent (i.e., one functional PCA basis per sentence) and lexicon-independent (i.e., a single functional PCA basis across sentences) models. The experimental results show that the proposed system can lead to accuracies as high as 75.8% in binary emotion classification, which is 6.2% higher than the accuracy achieved by a benchmark system trained with global F0 statistics. The approach can be implemented at sub-sentence level (e.g., 0.5 s segments), facilitating the detection of localized emotional information conveyed within the sentence. The approach is validated with the SEMAINE database, which is a spontaneous corpus. The results indicate that the proposed scheme can be effectively employed in real applications to detect emotional speech.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,