کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
557935 874817 2011 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments
چکیده انگلیسی

Automatic speech recognition (ASR) in reverberant environments is still a challenging task. In this study, we propose a robust feature-extraction method on the basis of the normalization of the sub-band temporal modulation envelopes (TMEs). The sub-band TMEs were extracted using a series of constant bandwidth band-pass filters with Hilbert transforms followed by low-pass filtering. Based on these TMEs, the modulation spectrums in both clean and reverberation spaces are transformed to a reference space by using modulation transfer functions (MTFs), wherein the MTFs are estimated as the measure of the modulation transfer effect on the sub-band TMEs between the clean, reverberation, and reference spaces. By using the MTFs on the modulation spectrum, it is supposed that the difference on the modulation spectrum caused by the difference of the recording environments is removed. Based on the normalized modulation spectrum, inverse Fourier transform was conducted to restore the sub-band TMEs by retaining their original phase information. We tested the proposed method on speech recognition experiments in a reverberant room with differing speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel frequency cepstral coefficients with mean and variance normalization was used as the baseline. The experimental results showed that by averaging the results for SMDs from 50 cm to 400 cm, we obtained a 44.96% relative improvement by only using sub-band TME processing, and obtained a further 15.68% relative improvement by performing the normalization on the modulation spectrum of the sub-band TMEs. In all, we obtained a 53.59% relative improvement, which was better than using other temporal filtering and normalization methods.

Research highlights▶Sub-band temporal envelopes contribute to speech intelligibility.▶Sub-band temporal envelopes are more robust to reverberation than the sub-band power energy.▶Normalization on sub-band temporal envelopes can reduce the mismatch caused by the reverberation environments.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 25, Issue 3, July 2011, Pages 571–584
نویسندگان
, , ,