Stereo hidden Markov modeling for noise robust speech recognition

Article ID	Journal	Published Year	Pages	File Type
558412	Computer Speech & Language	2013	13 Pages	PDF

Abstract

This paper investigates a noise robust technique for automatic speech recognition which exploits hidden Markov modeling of stereo speech features from clean and noisy channels. The HMM trained this way, referred to as stereo HMM, has in each state a Gaussian mixture model (GMM) with a joint distribution of both clean and noisy speech features. Given the noisy speech input, the stereo HMM gives rise to a two-pass compensation and decoding process where MMSE denoising based on N-best hypotheses is first performed and followed by decoding the denoised speech in a reduced search space on lattice. Compared to the feature space GMM-based denoising approaches, the stereo HMM is advantageous as it has finer-grained noise compensation and makes use of information of the whole noisy feature sequence for the prediction of each individual clean feature. Experiments on large vocabulary spontaneous speech from speech-to-speech translation applications show that the proposed technique yields superior performance than its feature space counterpart in noisy conditions while still maintaining decent performance in clean conditions.

► Hidden Markov model (HMM) is trained on stereo speech features from clean and noisy channels for noise robust automatic speech recognition. ► Stereo HMM results in a two-pass compensation and decoding strategy. ► It has finer-grained noisy feature compensation compared to stereo GMM. ► Stereo HMM uses information of the whole noisy feature sequence for the compensation of each individual clean feature.

Keywords

Stochastic mapping Noise robustness Automatic speech recognition