Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
558300 | Computer Speech & Language | 2014 | 17 Pages |
•We propose neutral reference models to detect emotions in the fundamental frequency.•A novel scheme based on functional data analysis is presented.•The proposed approach achieves higher accuracies than the state-of-the-art method.•The scheme is also employed to the to detect the most emotionally salient segments.•The approach is validated at the sub-sentence level by employing a natural database.
This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours. The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto that basis. For a given F0 contour, we estimate the functional principal component analysis (PCA) projections, which are used as features for emotion detection. The approach is evaluated with lexicon-dependent (i.e., one functional PCA basis per sentence) and lexicon-independent (i.e., a single functional PCA basis across sentences) models. The experimental results show that the proposed system can lead to accuracies as high as 75.8% in binary emotion classification, which is 6.2% higher than the accuracy achieved by a benchmark system trained with global F0 statistics. The approach can be implemented at sub-sentence level (e.g., 0.5 s segments), facilitating the detection of localized emotional information conveyed within the sentence. The approach is validated with the SEMAINE database, which is a spontaneous corpus. The results indicate that the proposed scheme can be effectively employed in real applications to detect emotional speech.