Article ID Journal Published Year Pages File Type
565952 Speech Communication 2012 10 Pages PDF
Abstract

This paper focuses on discriminative language models (DLMs) for large vocabulary speech recognition tasks. To train such models, we usually use a large number of hypotheses generated for each utterance by a speech recognizer, namely an n-best list or a lattice. Since the data size is large, we usually need a high-end machine or a large-scale distributed computation system consisting of many computers for model training. However, it is still unclear whether or not such a large number of sentence hypotheses are necessary. Furthermore, we do not know which kinds of sentences are necessary. In this paper, we show that we can generate a high performance model using small subsets of the n-best lists by choosing samples properly, i.e., we describe a sample selection method for DLMs. Sample selection reduces the memory footprint needed for holding training samples and allows us to train models in a standard machine. Furthermore, it enables us to generate a highly accurate model using various types of features. Specifically, experimental results show that even training using two samples in each list can provide an accurate model with a small memory footprint.

► We show the effect of sample selection in discriminative language model training. ► The effect is investigated in speech recognition with various training approaches. ► The accuracy improves with a small set of selected samples in some training methods. ► Sample selection dramatically reduces the memory footprint needed for training. A variety of features is effective and easy to use in this sample selection scenario.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,