Article ID Journal Published Year Pages File Type
558401 Computer Speech & Language 2008 18 Pages PDF
Abstract

In this work, we combine maximum mutual information parameter estimation with speaker-adapted training (SAT). As will be shown, this can be achieved by performing unsupervised estimation of speaker adaptation parameters on the test data, a distinct advantage for many recognition tasks involving conversational speech. We derive re-estimation formulae for the basic speaker-independent means and variances, the optimal regression class for each Gaussian component when multiple speaker-dependent linear transforms are used for adaptation, as well as the optimal feature-space transformation matrix for use with semi-tied covariance matrices. We also propose an approximation to the maximum likelihood and maximum mutual information SAT re-estimation formulae that greatly reduces the amount of disk space required to conduct training on corpora which contain speech from hundreds or thousands of speakers. We also present empirical evidence of the importance of combination speaker adaptation with discriminative training. In particular, on a subset of the data used for the NIST RT05 evaluation, we show that including maximum likelihood linear regression transformations in the MMI re-estimation formulae provides a WER of 35.2% compared with 39.1% obtained when speaker adaptation is ignored during discriminative training.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,