Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10368613 | Computer Speech & Language | 2005 | 24 Pages |
Abstract
The aim of this investigation is to determine to what extent automatic speech recognition may be enhanced if, in addition to the linear compensation accomplished by mean and variance normalisation, a non-linear mismatch reduction technique is applied to the cepstral and energy features, respectively. An additional goal is to determine whether the degree of mismatch between the feature distributions of the training and test data that is associated with acoustic mismatch, differs for the cepstral and energy features. Towards these aims, two non-linear mismatch reduction techniques - time domain noise reduction and histogram normalisation - were evaluated on the Aurora2 digit recognition task as well as on a continuous speech recognition task with noisy test conditions similar to those in the Aurora2 experiments. The experimental results show that recognition performance is enhanced by the application of both non-linear mismatch reduction techniques. The best results are obtained when the two techniques are applied simultaneously. The results also reveal that the mismatch in the energy features is quantitatively and qualitatively much larger than the corresponding mismatch associated with the cepstral coefficients. The most substantial gains in average recognition rate are therefore accomplished by reducing training-test mismatch for the energy features.
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Febe de Wet, Johan de Veth, Loe Boves, Bert Cranen,