Article ID Journal Published Year Pages File Type
568975 Speech Communication 2006 13 Pages PDF
Abstract

In this paper, several techniques are proposed to incorporate the uncertainty of the clean speech estimate in the decoding process of the backend recogniser in the context of model-based feature enhancement (MBFE) for noise robust speech recognition. Usually, the Gaussians in the acoustic space are sampled in a single point estimate, which means that the backend recogniser considers its input as a noise-free utterance. However, in this way the variance of the estimator is neglected. To solve this problem, it has already been argued that the acoustic space should be evaluated in a probability density function, e.g. a Gaussian observation pdf. We illustrate that this Gaussian observation pdf can be replaced by a computationally more tractable discrete pdf, consisting of a weighted sum of delta functions. We also show how improved posterior state probabilities can be obtained by calculating their maximum likelihood estimates or by using the pdf of clean speech conditioned on both the noisy speech and the backend Gaussian. Another simple and efficient technique is to replace these posterior probabilities by M Kronecker deltas, which results in M front-end feature vector candidates, and to take the maximum over their backend scores. Experimental results are given for the Aurora2 and Aurora4 database to compare the proposed techniques. A significant decrease of the word error rate of the resulting speech recognition system is obtained.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,