Article ID Journal Published Year Pages File Type
455070 Computers & Electrical Engineering 2013 8 Pages PDF
Abstract

Several automatic speech recognition engines use Mel Frequency Cepstral Coefficients (MFCCs) features internally. Specifically, these features, extracted from speech, are used to build acoustic models in the form of hidden Markov models (HMMs). However, speech features depend on the sampling rate of the speech and subsequently acoustic models built using features extracted at a certain sampling rate cannot be used by a speech engine to recognize speech sampled at a different sampling rate. In this paper, we first derive a relationship between the MFCC features of the re-sampled speech and the MFCC features of the original sampled speech and propose a modified Mel filter bank so that the features extracted at different sampling frequencies are correlated. We show experimentally that the acoustic models built with speech sampled at one frequency can be used to recognize sub-sampled speech with high accuracies.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights► Mel filter bank design to enable reliable recognition of sub-sampled speech. ► Enables use of the same acoustic models to recognize speech at any sampling frequency. ► Makes use of all information available in the sub-sampled speech signal. ► Speech recognition accuracies degrade very gradually with higher sub-sampling. ► Recognition accuracy is only 4% below the base-line for a sub-sampling by a factor 2.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,