Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition

Article ID	Journal	Published Year	Pages	File Type
565324	Speech Communication	2011	11 Pages	PDF

Abstract

Cepstral normalisation in automatic speech recognition is investigated in the context of robustness to additive noise. In this paper, it is argued that such normalisation leads naturally to a speech feature based on signal to noise ratio rather than absolute energy (or power). Explicit calculation of this SNR-cepstrum by means of a noise estimate is shown to have theoretical and practical advantages over the usual (energy based) cepstrum. The relationship between the SNR-cepstrum and the articulation index, known in psycho-acoustics, is discussed. Experiments are presented suggesting that the combination of the SNR-cepstrum with the well known perceptual linear prediction method can be beneficial in noisy environments.

► Cepstral normalisation is shown to be equivalent to using the SNR-spectrum and SNR-cepstrum. ►Calculation of the SNR-spectrum directly, rather than relying on CMN to do it, is beneficial. ► The SNR-cepstrum is closely related to the articulation index known in psycho-acoustics.

Keywords

Noise robustness Automatic speech recognition AURORA