دانلود رایگان مقاله: سنتز شمرده شمرده اطلاعات محور با شبکه های عصبی عمیق

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558210	1451691	2016	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Data driven articulatory synthesis with deep neural networks

ترجمه فارسی عنوان

سنتز شمرده شمرده اطلاعات محور با شبکه های عصبی عمیق

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

سنتز شمرده شمرده؛ articulography الکترومغناطیسی؛ یادگیری عمیق؛ مدل مخلوط گوسی

Articulatory synthesis Gaussian mixture models - مدل مخلوط گاوسی Electromagnetic articulography - مقالات الکترومغناطیسی Deep learning - یادگیری عمیق

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

سنتز شمرده شمرده اطلاعات محور با شبکه های عصبی عمیق

چکیده انگلیسی

• We present an articulatory-to-acoustic mapping for real-time articulatory synthesis.
• The method uses a deep neural network with a tapped-delay input line.
• Tapped-delay line efficiently captures dynamics in articulatory trajectories.
• The model achieved higher accuracy than competing models based on Gaussian mixtures.
• The improvement was also found perceivable in a subjective listening test.

The conventional approach for data-driven articulatory synthesis consists of modeling the joint acoustic-articulatory distribution with a Gaussian mixture model (GMM), followed by a post-processing step that optimizes the resulting acoustic trajectories. This final step can significantly improve the accuracy of the GMM frame-by-frame mapping but is computationally intensive and requires that the entire utterance be synthesized beforehand, making it unsuited for real-time synthesis. To address this issue, we present a deep neural network (DNN) articulatory synthesizer that uses a tapped-delay input line, allowing the model to capture context information in the articulatory trajectory without the need for post-processing. We characterize the DNN as a function of the context size and number of hidden layers, and compare it against two GMM articulatory synthesizers, a baseline model that performs a simple frame-by-frame mapping, and a second model that also performs trajectory optimization. Our results show that a DNN with a 60-ms context window and two 512-neuron hidden layers can synthesize speech at four times the frame rate – comparable to frame-by-frame mappings, while improving the accuracy of trajectory optimization (a 9.8% reduction in Mel Cepstral distortion). Subjective evaluation through pairwise listening tests also shows a strong preference toward the DNN articulatory synthesizer when compared to GMM trajectory optimization.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 36, March 2016, Pages 260–273

نویسندگان

Sandesh Aryal, Ricardo Gutierrez-Osuna,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : سنتز شمرده شمرده اطلاعات محور با شبکه های عصبی عمیق

دسترسی سریع

ارتباط

English Website