Article ID Journal Published Year Pages File Type
558102 Biomedical Signal Processing and Control 2011 8 Pages PDF
Abstract

Two new approaches to the feature extraction process for automatic stress and emotion classification in speech are proposed and examined. The first method uses the empirical model decomposition (EMD) of speech into intrinsic mode functions (IMF) and calculates the average Renyi entropy for the IMF channels. The second method calculates the average spectral energy in the sub-bands of speech spectrograms and can be enhanced by anisotropic diffusion filtering of spectrograms. In the second method, three types of sub-bands were examined: critical, Bark and ERB. The performance of the new features was compared with the conventional mel frequency cepstral coefficients (MFCC) method. The modeling and classification process applied the classical GMM and KNN algorithms. The experiments used two databases containing natural speech, SUSAS (annotated with three different levels of stress) and the Oregon Research Institute (ORI) data (annotated with five different emotions: neutral, angry, anxious, dysphoric, and happy). For the SUSAS data, the best average recognition rates of 77% were obtained when using spectrogram features calculated within ERB bands and combined with anisotropic filtering. For the ORI data, the best result of 53% was obtained with the same method but without anisotropic filtering. Both the GMM and KNN classifiers showed similar performance. These results indicated that the spectrogram patterns provide promising results in the case of stress recognition, however further improvements are needed in the case of emotion recognition.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , ,