Article ID Journal Published Year Pages File Type
567514 Speech Communication 2012 15 Pages PDF
Abstract

We propose a novel framework for noise robust automatic speech recognition (ASR) based on cochlear implant-like spectrally reduced speech (SRS). Two experimental protocols (EPs) are proposed in order to clarify the advantage of using SRS for noise robust ASR. These two EPs assess the SRS in both the training and testing environments. Speech enhancement was used in one of two EPs to improve the quality of testing speech. In training, SRS is synthesized from original clean speech whereas in testing, SRS is synthesized directly from noisy speech or from enhanced speech signals. The synthesized SRS is recognized with the ASR systems trained on SRS signals, with the same synthesis parameters. Experiments show that the ASR results, in terms of word accuracy, calculated with ASR systems using SRS, are significantly improved compared to the baseline non-SRS ASR systems. We propose also a measure of the training and testing mismatch based on the Kullback–Leibler divergence. The numerical results show that using the SRS in ASR systems helps in reducing significantly the training and testing mismatch due to environmental noise. The training of the HMM-based ASR systems and the recognition tests were performed by using the HTK toolkit and the Aurora 2 speech database.

► ASR systems, based on cochlear implant-like SRS, gain noise robustness. ► SRS, synthesized from clean and noisy speech, is used in train and test, respectively. ► Train/test mismatch could be measured via Kullback–Leibler divergence (KLD). ► Using SRS in ASR helps in reducing significantly train/test mismatch, measured by KLD. ► Experiments are performed by using Aurora 2 database and HTK toolkit.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,