Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10370216 | Speech Communication | 2005 | 23 Pages |
Abstract
In this paper, we describe research to extend the capability of an existing talking head, Baldi, to be multilingual. We use parsimonious client/server architecture to impose autonomy in the functioning of an auditory speech module and a visual speech synthesis module. This scheme enables the implementation and the joint application of text-to-speech synthesis and facial animation in many languages simultaneously. Additional languages can be added to the system by defining a unique phoneme set and unique phoneme definitions for the visible speech for each language. The accuracy of these definitions is tested in perceptual experiments in which human observers identify auditory speech in noise presented alone or paired with the synthetic versus a comparable natural face. We illustrate the development of an Arabic talking head, Badr, and demonstrate how the empirical evaluation enabled the improvement of the visible speech synthesis from one version to another.
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Slim Ouni, Michael M. Cohen, Dominic W. Massaro,