دانلود رایگان مقاله: یک چارچوب همجوشی اطلاعات با تلفیق ویژگی چند کاناله و ترکیبی از چندین چشم انداز برای تشخیص قوی آرشیو میکروفون با استفاده از یادگیری عمیق

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4973729	1451681	2017	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

ترجمه فارسی عنوان

یک چارچوب همجوشی اطلاعات با تلفیق ویژگی چند کاناله و ترکیبی از چندین چشم انداز برای تشخیص قوی آرشیو میکروفون با استفاده از یادگیری عمیق

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Microphone array - آرایه میکروفون Information fusion - دیتا فیوژن یا تلفیق اطلاعات Robust speech recognition - شناسایی قوی سخنرانی Deep learning - یادگیری عمیق

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

یک چارچوب همجوشی اطلاعات با تلفیق ویژگی چند کاناله و ترکیبی از چندین چشم انداز برای تشخیص قوی آرشیو میکروفون با استفاده از یادگیری عمیق

چکیده انگلیسی

We present an information fusion approach to the robust recognition of multi-microphone speech. It is based on a deep learning framework with a large deep neural network (DNN) consisting of subnets designed from different perspectives. Multiple knowledge sources are then reasonably integrated via an early fusion of normalized noisy features with multiple beamforming techniques, enhanced speech features, speaker-related features, and other auxiliary features concatenated as the input to each subnet to compensate for imperfect front-end processing. Furthermore, a late fusion strategy is utilized to leverage the complementary natures of the different subnets by combining the outputs of all subnets to produce a single output set. Testing on the CHiME-3 task of recognizing microphone array speech, we demonstrate in our empirical study that the different information sources complement each other and that both early and late fusions provide significant performance gains, with an overall word error rate of 10.55% when combining 12 systems. Furthermore, by utilizing an improved technique for beamforming and a powerful recurrent neural network (RNN)-based language model for rescoring, a WER of 9.08% can be achieved for the best single DNN system with one-pass decoding among all of the systems submitted to the CHiME-3 challenge.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 46, November 2017, Pages 517-534

نویسندگان

Yan-Hui Tu, Jun Du, Qing Wang, Xiao Bao, Li-Rong Dai, Chin-Hui Lee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک چارچوب همجوشی اطلاعات با تلفیق ویژگی چند کاناله و ترکیبی از چندین چشم انداز برای تشخیص قوی آرشیو میکروفون با استفاده از یادگیری عمیق

دسترسی سریع

ارتباط

English Website