Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
569018 | Speech Communication | 2006 | 23 Pages |
In this work, we present a model combination approach at the word level that aims to improve the modeling of spontaneous speech variabilities on a highly spontaneous, real life medical transcription task. The technique (1) separates speech variabilities into pre-defined classes, (2) generates speech variability specific acoustic and pronunciation models and (3) properly combines these models later in the search procedure on a word level basis. For efficient integration of the specific acoustic and pronunciation models into the search procedure, a theoretical framework is provided. Our algorithm is a general approach that can be applied to model various speech variabilities. In our experiments, we focused on the variabilities related to filled pauses, rate of speech and speaker accent. Our best system combines six variability specific acoustic and pronunciation models on a word level and achieves a word error rate reduction of 13% relative compared to the baseline. In a number of contrast experiments we evaluated the importance of different components in our system and explored ways to reduce the system complexity.