Article ID Journal Published Year Pages File Type
557935 Computer Speech & Language 2011 14 Pages PDF
Abstract

Automatic speech recognition (ASR) in reverberant environments is still a challenging task. In this study, we propose a robust feature-extraction method on the basis of the normalization of the sub-band temporal modulation envelopes (TMEs). The sub-band TMEs were extracted using a series of constant bandwidth band-pass filters with Hilbert transforms followed by low-pass filtering. Based on these TMEs, the modulation spectrums in both clean and reverberation spaces are transformed to a reference space by using modulation transfer functions (MTFs), wherein the MTFs are estimated as the measure of the modulation transfer effect on the sub-band TMEs between the clean, reverberation, and reference spaces. By using the MTFs on the modulation spectrum, it is supposed that the difference on the modulation spectrum caused by the difference of the recording environments is removed. Based on the normalized modulation spectrum, inverse Fourier transform was conducted to restore the sub-band TMEs by retaining their original phase information. We tested the proposed method on speech recognition experiments in a reverberant room with differing speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel frequency cepstral coefficients with mean and variance normalization was used as the baseline. The experimental results showed that by averaging the results for SMDs from 50 cm to 400 cm, we obtained a 44.96% relative improvement by only using sub-band TME processing, and obtained a further 15.68% relative improvement by performing the normalization on the modulation spectrum of the sub-band TMEs. In all, we obtained a 53.59% relative improvement, which was better than using other temporal filtering and normalization methods.

Research highlights▶Sub-band temporal envelopes contribute to speech intelligibility.▶Sub-band temporal envelopes are more robust to reverberation than the sub-band power energy.▶Normalization on sub-band temporal envelopes can reduce the mismatch caused by the reverberation environments.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,