Least squares filtering of speech signals for robust ASR

Article ID	Journal	Published Year	Pages	File Type
568977	Speech Communication	2006	17 Pages	PDF

Abstract

The behavior of the least squares filter (LeSF) is analyzed for a class of non-stationary signals that are either (a) composed of multiple sinusoids (voiced speech) whose frequencies, phases and the amplitudes may vary from block to block or (b) are output of an all-pole filter excited by white noise input (unvoiced speech segments) and which are embedded in white noise. In this work, analytic expressions for the weights and the output of the LeSF are derived as a function of the block length and the signal SNR computed over the corresponding block. We have used LeSF filter estimated on each block to enhance the speech signals embedded in white noise as well as other realistic noises such as factory noise and an aircraft cockpit noise. Automatic speech recognition (ASR) experiments on a connected numbers task, OGI Numbers95 [Varga, A., Steeneken, H., Tomlinson, M., Jones, D., 1992. The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, DRA Speech Research Unit, Malvern, England] show that the proposed LeSF based features provide a significant improvement in speech recognition accuracies in various non-stationary noise conditions when compared directly to the un-enhanced speech, spectral subtraction and noise robust CJ-RASTA-PLP features.

Keywords

speech enhancement Robust speech recognition adaptive filtering Least squares