Article ID Journal Published Year Pages File Type
558218 Computer Speech & Language 2016 18 Pages PDF
Abstract

•The differenced-MMI (dMMI) is a discriminative criterion that generalizes MPE and BMMI.•We discuss the behavior of dMMI when there are errors in the transcription labels.•dMMI may be less sensitive to such errors than other criteria.•We support our claim with unsupervised speaker adaptation experiments.•dMMI based adaptation achieves significant gains over MLLR for 2 LVCSR tasks.

Discriminative criteria have been widely used for training acoustic models for automatic speech recognition (ASR). Many discriminative criteria have been proposed including maximum mutual information (MMI), minimum phone error (MPE), and boosted MMI (BMMI). Discriminative training is known to provide significant performance gains over conventional maximum-likelihood (ML) training. However, as discriminative criteria aim at direct minimization of the classification error, they strongly rely on having accurate reference labels. Errors in the reference labels directly affect the performance. Recently, the differenced MMI (dMMI) criterion has been proposed for generalizing conventional criteria such as BMMI and MPE. dMMI can approach BMMI or MPE if its hyper-parameters are properly set. Moreover, dMMI introduces intermediate criteria that can be interpreted as smoothed versions of BMMI or MPE. These smoothed criteria are robust to errors in the reference labels. In this paper, we demonstrate the effect of dMMI on unsupervised speaker adaptation where the reference labels are estimated from a first recognition pass and thus inevitably contain errors. In particular, we introduce dMMI-based linear regression (dMMI-LR) adaptation and demonstrate significant gains in performance compared with MLLR and BMMI-LR in two large vocabulary lecture recognition tasks.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , , ,