دانلود رایگان مقاله: تجزیه و تحلیل سخنرانی چند کاناله و تحلیل مدولاسیون دامنه برای تشخیص گفتار خودکار با رضایت شغلی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4973731	1451681	2017	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition

ترجمه فارسی عنوان

تجزیه و تحلیل سخنرانی چند کاناله و تحلیل مدولاسیون دامنه برای تشخیص گفتار خودکار با رضایت شغلی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

CHIME Feature extraction - استخراج ویژگی Non-negative matrix factorization - تقسیم ماتریس غیر منفی speech enhancement - تقویت گفتار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

تجزیه و تحلیل سخنرانی چند کاناله و تحلیل مدولاسیون دامنه برای تشخیص گفتار خودکار با رضایت شغلی

چکیده انگلیسی

The paper describes a system for automatic speech recognition (ASR) that is benchmarked with data of the 3rd CHiME challenge, a dataset comprising distant microphone recordings of noisy acoustic scenes in public environments. The proposed ASR system employs various methods to increase recognition accuracy and noise robustness. Two different multi-channel speech enhancement techniques are used to eliminate interfering sounds in the audio stream. One speech enhancement method aims at separating the target speaker's voice from background sources based on non-negative matrix factorization (NMF) using variational Bayesian (VB) inference to estimate NMF parameters. The second technique is based on a time-varying minimum variance distortionless response (MVDR) beamformer that uses spatial information to suppress sound signals not arriving from a desired direction. Prior to speech enhancement, a microphone channel failure detector is applied that is based on cross-comparing channels using a modulation-spectral representation of the speech signal. ASR feature extraction employs the amplitude modulation filter bank (AMFB) that implicates prior information of speech to analyze its temporal dynamics. AMFBs outperform the commonly used frame splicing technique of filter bank features in conjunction with a deep neural network (DNN) based ASR system, which denotes an equivalent data-driven approach to extract modulation-spectral information. In addition, features are speaker adapted, a recurrent neural network (RNN) is employed for language modeling, and hypotheses of different ASR systems are combined to further enhance the recognition accuracy. The proposed ASR system achieves an absolute word error rate (WER) of 5.67% on the real evaluation test data, which is 0.16% lower compared to the best score reported within the 3rd CHiME challenge.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 46, November 2017, Pages 558-573

نویسندگان

Niko Moritz, Kamil AdiloÄlu, Jörn Anemüller, Stefan Goetze, Birger Kollmeier,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تجزیه و تحلیل سخنرانی چند کاناله و تحلیل مدولاسیون دامنه برای تشخیص گفتار خودکار با رضایت شغلی

دسترسی سریع

ارتباط

English Website