Additive background noise as a source of non-linear mismatch in the cepstral and log-energy domain

Article ID	Journal	Published Year	Pages	File Type
10368613	Computer Speech & Language	2005	24 Pages	PDF

Abstract

The aim of this investigation is to determine to what extent automatic speech recognition may be enhanced if, in addition to the linear compensation accomplished by mean and variance normalisation, a non-linear mismatch reduction technique is applied to the cepstral and energy features, respectively. An additional goal is to determine whether the degree of mismatch between the feature distributions of the training and test data that is associated with acoustic mismatch, differs for the cepstral and energy features. Towards these aims, two non-linear mismatch reduction techniques - time domain noise reduction and histogram normalisation - were evaluated on the Aurora2 digit recognition task as well as on a continuous speech recognition task with noisy test conditions similar to those in the Aurora2 experiments. The experimental results show that recognition performance is enhanced by the application of both non-linear mismatch reduction techniques. The best results are obtained when the two techniques are applied simultaneously. The results also reveal that the mismatch in the energy features is quantitatively and qualitatively much larger than the corresponding mismatch associated with the cepstral coefficients. The most substantial gains in average recognition rate are therefore accomplished by reducing training-test mismatch for the energy features.