کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
455070 | 695334 | 2013 | 8 صفحه PDF | دانلود رایگان |

Several automatic speech recognition engines use Mel Frequency Cepstral Coefficients (MFCCs) features internally. Specifically, these features, extracted from speech, are used to build acoustic models in the form of hidden Markov models (HMMs). However, speech features depend on the sampling rate of the speech and subsequently acoustic models built using features extracted at a certain sampling rate cannot be used by a speech engine to recognize speech sampled at a different sampling rate. In this paper, we first derive a relationship between the MFCC features of the re-sampled speech and the MFCC features of the original sampled speech and propose a modified Mel filter bank so that the features extracted at different sampling frequencies are correlated. We show experimentally that the acoustic models built with speech sampled at one frequency can be used to recognize sub-sampled speech with high accuracies.
Figure optionsDownload as PowerPoint slideHighlights
► Mel filter bank design to enable reliable recognition of sub-sampled speech.
► Enables use of the same acoustic models to recognize speech at any sampling frequency.
► Makes use of all information available in the sub-sampled speech signal.
► Speech recognition accuracies degrade very gradually with higher sub-sampling.
► Recognition accuracy is only 4% below the base-line for a sub-sampling by a factor 2.
Journal: Computers & Electrical Engineering - Volume 39, Issue 2, February 2013, Pages 655–662