دانلود رایگان مقاله: شبکه های عصبی پیچیده شبکه ترکیبی برای تشخیص گفتار مبتنی بر اطلاعات با استفاده از آئینه و آکوستیک

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4977829	1452011	2017	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition

ترجمه فارسی عنوان

شبکه های عصبی پیچیده شبکه ترکیبی برای تشخیص گفتار مبتنی بر اطلاعات با استفاده از آئینه و آکوستیک

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

شناسایی خودکار گفتار، مسیرهای مفصلی، متغیرهای آوازی، شبکه های عصبی مجتمع ترکیبی کنفورد فرکانس زمان، شبکه های عصبی انعقادی،

Automatic speech recognition - تشخیص گفتار خودکار Convolutional neural networks - شبکه عصبی همجوشی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

شبکه های عصبی پیچیده شبکه ترکیبی برای تشخیص گفتار مبتنی بر اطلاعات با استفاده از آئینه و آکوستیک

چکیده انگلیسی

Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory and acoustic space increase complexity of speech-to-articulatory mapping, which is already an ill-posed problem due to its inherent nonlinearity and non-unique nature. This work explores using deep neural networks (DNNs) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space. Our speech-inversion results indicate that the CNN models perform better than their DNN counterparts. In addition, we use these inverse-models to generate articulatory information from speech for two separate speech recognition tasks: the WSJ1 and Aurora-4 continuous speech recognition tasks. This work proposes a hybrid convolutional neural network (HCNN), where two parallel layers are used to jointly model the acoustic and articulatory spaces, and the decisions from the parallel layers are fused at the output context-dependent (CD) state level. The acoustic model performs time-frequency convolution on filterbank-energy-level features, whereas the articulatory model performs time convolution on the articulatory features. The performance of the proposed architecture is compared to that of the CNN- and DNN-based systems using gammatone filterbank energies as acoustic features, and the results indicate that the HCNN-based model demonstrates lower word error rates compared to the CNN/DNN baseline systems.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 89, May 2017, Pages 103-112

نویسندگان

Vikramjit Mitra, Ganesh Sivaraman, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Mark Tiede,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : شبکه های عصبی پیچیده شبکه ترکیبی برای تشخیص گفتار مبتنی بر اطلاعات با استفاده از آئینه و آکوستیک

دسترسی سریع

ارتباط

English Website