Article ID Journal Published Year Pages File Type
563104 Computer Speech & Language 2013 14 Pages PDF
Abstract

In the last few years, the number of systems and devices that use voice based interaction has grown significantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the final application is a speech based interface. In this paper we present an objective definition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classification and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classification and a 15.7% error rate for voice pleasantness intensity estimation.

► Definition of voice pleasantness and description of a conceptual study framework. ► Presentation of a voice pleasantness classification and intensity estimation system. ► Machine learning approach provided 9.1% error for voice pleasantness classification. ► Feature selection process provided an optimized feature vector with few dimensions. ► Findings can help to improve TTS output or assist on voice talent selection.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , ,