New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer

Article ID	Journal	Published Year	Pages	File Type
565543	Speech Communication	2006	16 Pages	PDF

Abstract

In this paper we present a new method for synthesizing multiple languages with the same voice, using HMM-based speech synthesis. Our approach, which we call HMM-based polyglot synthesis, consists of mixing speech data from several speakers in different languages, to create a speaker- and language-independent (SI) acoustic model. We then adapt the resulting SI model to a specific speaker in order to create a speaker dependent (SD) acoustic model. Using the SD model it is possible to synthesize any of the languages used to train the SI model, with the voice of the speaker, regardless of the speaker’s language. We show that the performance obtained with our method is better than that of methods based on phone mapping for both adaptation and synthesis. Furthermore, for languages not included during training the performance of our approach also equals or surpasses the performance of any monolingual synthesizers based on the languages used to train the multilingual one. This means that our method can be used to create synthesizers for languages where no speech resources are available.

Keywords

Multilingual