On the development of an automatic voice pleasantness classification and intensity estimation system

Article ID	Journal	Published Year	Pages	File Type
563104	Computer Speech & Language	2013	14 Pages	PDF

Abstract

In the last few years, the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the final application is a speech based interface. In this paper we present an objective definition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classification and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classification and a 15.7% error rate for voice pleasantness intensity estimation.

► Definition of voice pleasantness and description of a conceptual study framework. ► Presentation of a voice pleasantness classification and intensity estimation system. ► Machine learning approach provided 9.1% error for voice pleasantness classification. ► Feature selection process provided an optimized feature vector with few dimensions. ► Findings can help to improve TTS output or assist on voice talent selection.

Keywords

Text-to-speech synthesis