Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

Article ID	Journal	Published Year	Pages	File Type
10368538	Computer Speech & Language	2014	17 Pages	PDF

Abstract

This papers studies the synthesis of speech over a wide vocal effort continuum and its perception in the presence of noise. Three types of speech are recorded and studied along the continuum: breathy, normal, and Lombard speech. Corresponding synthetic voices are created by training and adapting the statistical parametric speech synthesis system GlottHMM. Natural and synthetic speech along the continuum is assessed in listening tests that evaluate the intelligibility, quality, and suitability of speech in three different realistic multichannel noise conditions: silence, moderate street noise, and extreme street noise. The evaluation results show that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.

Keywords

Vocal effort Adaptation Lombard speech Statistical parametric speech synthesis Intelligibility