Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
563118	875471	2013	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

speech enhancement - تقویت گفتار Model adaptation - سازگاری مدل Robust speech recognition - شناسایی قوی سخنرانی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

چکیده انگلیسی

A conventional approach to noise robust speech recognition consists of employing a speech enhancement pre-processor prior to recognition. However, such a pre-processor usually introduces artifacts that limit recognition performance improvement. In this paper we discuss a framework for improving the interconnection between speech enhancement pre-processors and a recognizer. The framework relies on recent proposals for increasing robustness by replacing the point estimate of the enhanced features with a distribution with a dynamic (i.e. time varying) feature variance. We have recently proposed a model for the dynamic feature variance consisting of a dynamic feature variance root obtained from the pre-processor, which is multiplied by a weight representing the pre-processor uncertainty, and that uses adaptation data to optimize the pre-processor uncertainty weight. The formulation of the method is general and could be used with any speech enhancement pre-processor. However, we observed that in case of noise reduction based on spectral subtraction or related approaches, adaptation could fail because the proposed model is weak at representing well the actual dynamic feature variance. The dynamic feature variance changes according to the level of speech sound, which varies with the HMM states. Therefore, we propose improving the model by introducing HMM state dependency. We achieve this by using a cluster-based representation, i.e. the Gaussians of the acoustic model are grouped into clusters and a different pre-processor uncertainty weight is associated with each cluster. Experiments with various pre-processors and recognition tasks prove the generality of the proposed integration scheme and show that the proposed extension improves the performance with various speech enhancement pre-processors.

► We investigate an interconnection scheme for speech enhancement and recognizer.
► The method uses adaptation data to optimize a dynamic model of the feature variance.
► We extend the dynamic feature variance model to a cluster-based representation.
► Significant performance improvement is demonstrated for various recognition tasks.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 1, January 2013, Pages 350–368

نویسندگان

Marc Delcroix, Shinji Watanabe, Tomohiro Nakatani, Atsushi Nakamura,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer

دسترسی سریع

ارتباط

English Website