On maximum mutual information speaker-adapted training

Article ID	Journal	Published Year	Pages	File Type
558401	Computer Speech & Language	2008	18 Pages	PDF

Abstract

In this work, we combine maximum mutual information parameter estimation with speaker-adapted training (SAT). As will be shown, this can be achieved by performing unsupervised estimation of speaker adaptation parameters on the test data, a distinct advantage for many recognition tasks involving conversational speech. We derive re-estimation formulae for the basic speaker-independent means and variances, the optimal regression class for each Gaussian component when multiple speaker-dependent linear transforms are used for adaptation, as well as the optimal feature-space transformation matrix for use with semi-tied covariance matrices. We also propose an approximation to the maximum likelihood and maximum mutual information SAT re-estimation formulae that greatly reduces the amount of disk space required to conduct training on corpora which contain speech from hundreds or thousands of speakers. We also present empirical evidence of the importance of combination speaker adaptation with discriminative training. In particular, on a subset of the data used for the NIST RT05 evaluation, we show that including maximum likelihood linear regression transformations in the MMI re-estimation formulae provides a WER of 35.2% compared with 39.1% obtained when speaker adaptation is ignored during discriminative training.

Keywords

Discriminative training Speech recognition