کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558354 874908 2013 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Blind source extraction for robust speech recognition in multisource noisy environments
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Blind source extraction for robust speech recognition in multisource noisy environments
چکیده انگلیسی

This paper proposes and describes a complete system for Blind Source Extraction (BSE). The goal is to extract a target signal source in order to recognize spoken commands uttered in reverberant and noisy environments, and acquired by a microphone array. The architecture of the BSE system is based on multiple stages: (a) TDOA estimation, (b) mixing system identification for the target source, (c) on-line semi-blind source separation and (d) source extraction. All the stages are effectively combined, allowing the estimation of the target signal with limited distortion.While a generalization of the BSE framework is described, here the proposed system is evaluated on the data provided for the CHiME Pascal 2011 competition, i.e. binaural recordings made in a real-world domestic environment. The CHiME mixtures are processed with the BSE and the recovered target signal is fed to a recognizer, which uses noise robust features based on Gammatone Frequency Cepstral Coefficients. Moreover, acoustic model adaptation is applied to further reduce the mismatch between training and testing data and improve the overall performance. A detailed comparison between different models and algorithmic settings is reported, showing that the approach is promising and the resulting system gives a significant reduction of the error rate.


► A system for Blind Source Extraction (BSE) is implemented and applied to increase the robustness of an automatic spoken commands recognition task.
► The system is based on multiple parallel stages: (a) target/noise source TDOA estimation, (b) blind target mixing system estimation, (c) semi blind source separation and (d) source signal enhancement.
► The BSE is combined with an ASR system based on robust feature analysis.
► Performances are evaluated with acoustic models derived directly from noise-free data and with models adapted to the output signal resulting from the BSE processing.
► Experimental results show that the proposed BSE system enables the command recognition in adverse conditions even when clean models are adopted, i.e. the resulting enhanced signals have limited distortion so that model adaptation is not strictly required.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 3, May 2013, Pages 703–725
نویسندگان
, ,