Article ID Journal Published Year Pages File Type
568465 Speech Communication 2016 10 Pages PDF
Abstract

•We investigate supervised, semi-supervised and unsupervised training of DLMs.•We use supervised and unsupervised confusion models to generate artificial data.•We propose three target output selection methods for unsupervised DLM training.•Ranking perceptron performs better than structured perceptron in most cases.•Significant gains in ASR accuracy are obtained with unmatched acoustic and text data.

Discriminative language modeling aims to reduce the error rates by rescoring the output of an automatic speech recognition (ASR) system. Discriminative language model (DLM) training conventionally follows a supervised approach, using acoustic recordings together with their manual transcriptions (reference) as training data, and the recognition performance is improved with increasing amount of such matched data. In this study we investigate the case where matched data for DLM training is limited or is not available at all, and explore methods to improve ASR accuracy by incorporating acoustic and text data that come from separate sources. For semi-supervised training, we utilize a confusion model to generate artificial hypotheses instead of the real ASR N-bests. For unsupervised training, we propose three target output selection methods to take over the missing reference. We handle this task both as a structured prediction and a reranking problem and employ two different variants of the WER-sensitive perceptron algorithm. We show that significant improvement over baseline ASR accuracy is obtained even when there is no transcribed acoustic data available to train the DLM.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,