Robust speech recognition using spatial–temporal feature distribution characteristics

Article ID	Journal	Published Year	Pages	File Type
536528	Pattern Recognition Letters	2011	8 Pages	PDF

Abstract

Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial–temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.

Research highlights► We present several novel HEQ approaches that exploit spatial–temporal feature distribution characteristics for speech feature normalization; the proposed approaches can yield significant performance improvements for the standard noise-robust ASR task. ► The contextual relationships of consecutive speech frames can provide additional information clues for feature normalization. ► The HEQ approaches are efficient and effective as compared to the other robustness methods.

Keywords

Noise robustness Speech recognition Histogram equalization