Joint variable frame rate and length analysis for speech recognition under adverse conditions

Article ID	Journal	Published Year	Pages	File Type
453708	Computers & Electrical Engineering	2014	11 Pages	PDF

Abstract

•A computationally efficient variable frame length and rate method is proposed.•The method relies on a posteriori signal-to-noise ratio weighted energy distance.•The method improves speech recognition accuracy in noisy environments.•Setting a proper range of frame length that is allowed to choose from is important.

This paper presents a method that combines variable frame length and rate analysis for speech recognition in noisy environments, together with an investigation of the effect of different frame lengths on speech recognition performance. The method adopts frame selection using an a posteriori signal-to-noise (SNR) ratio weighted energy distance and increases the length of the selected frames, according to the number of non-selected preceding frames. It assigns a higher frame rate and a normal frame length to a rapidly changing and high SNR region of a speech signal, and a lower frame rate and an increased frame length to a steady or low SNR region. The speech recognition results show that the proposed variable frame rate and length method outperforms fixed frame rate and length analysis, as well as standalone variable frame rate analysis in terms of noise-robustness.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide