Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
565324 | Speech Communication | 2011 | 11 Pages |
Cepstral normalisation in automatic speech recognition is investigated in the context of robustness to additive noise. In this paper, it is argued that such normalisation leads naturally to a speech feature based on signal to noise ratio rather than absolute energy (or power). Explicit calculation of this SNR-cepstrum by means of a noise estimate is shown to have theoretical and practical advantages over the usual (energy based) cepstrum. The relationship between the SNR-cepstrum and the articulation index, known in psycho-acoustics, is discussed. Experiments are presented suggesting that the combination of the SNR-cepstrum with the well known perceptual linear prediction method can be beneficial in noisy environments.
► Cepstral normalisation is shown to be equivalent to using the SNR-spectrum and SNR-cepstrum. ►Calculation of the SNR-spectrum directly, rather than relying on CMN to do it, is beneficial. ► The SNR-cepstrum is closely related to the articulation index known in psycho-acoustics.