Article ID Journal Published Year Pages File Type
569018 Speech Communication 2006 23 Pages PDF
Abstract

In this work, we present a model combination approach at the word level that aims to improve the modeling of spontaneous speech variabilities on a highly spontaneous, real life medical transcription task. The technique (1) separates speech variabilities into pre-defined classes, (2) generates speech variability specific acoustic and pronunciation models and (3) properly combines these models later in the search procedure on a word level basis. For efficient integration of the specific acoustic and pronunciation models into the search procedure, a theoretical framework is provided. Our algorithm is a general approach that can be applied to model various speech variabilities. In our experiments, we focused on the variabilities related to filled pauses, rate of speech and speaker accent. Our best system combines six variability specific acoustic and pronunciation models on a word level and achieves a word error rate reduction of 13% relative compared to the baseline. In a number of contrast experiments we evaluated the importance of different components in our system and explored ways to reduce the system complexity.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , , ,