Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech

Article ID	Journal	Published Year	Pages	File Type
558400	Computer Speech & Language	2008	23 Pages	PDF

Abstract

All speech produced by humans includes information about the speaker, including conveying the emotional state of the speaker. It is thus desirable to include vocal affect in any synthetic speech where improving the naturalness of the speech produced is important. However, the speech factors which convey affect are poorly understood, and their implementation in synthetic speech systems is not yet commonplace. A prototype system for the production of emotional synthetic speech using a commercial formant synthesiser was developed based on vocal emotion descriptions given in the literature. This paper describes work to improve and augment this system, based on a detailed investigation of emotive material spoken by two actors (one amateur, one professional). The results of this analysis are summarised, and were used to enhance the existing emotion rules used in the speech synthesis system. The enhanced system was evaluated by naive listeners in a perception experiment, and the simulated emotions were found to be more realistic than in the original version of the system.

Keywords

Speech perception system evaluation Affect Speech analysis Emotion