کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558210 1451691 2016 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Data driven articulatory synthesis with deep neural networks
ترجمه فارسی عنوان
سنتز شمرده شمرده اطلاعات محور با شبکه های عصبی عمیق
کلمات کلیدی
سنتز شمرده شمرده؛ articulography الکترومغناطیسی؛ یادگیری عمیق؛ مدل مخلوط گوسی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We present an articulatory-to-acoustic mapping for real-time articulatory synthesis.
• The method uses a deep neural network with a tapped-delay input line.
• Tapped-delay line efficiently captures dynamics in articulatory trajectories.
• The model achieved higher accuracy than competing models based on Gaussian mixtures.
• The improvement was also found perceivable in a subjective listening test.

The conventional approach for data-driven articulatory synthesis consists of modeling the joint acoustic-articulatory distribution with a Gaussian mixture model (GMM), followed by a post-processing step that optimizes the resulting acoustic trajectories. This final step can significantly improve the accuracy of the GMM frame-by-frame mapping but is computationally intensive and requires that the entire utterance be synthesized beforehand, making it unsuited for real-time synthesis. To address this issue, we present a deep neural network (DNN) articulatory synthesizer that uses a tapped-delay input line, allowing the model to capture context information in the articulatory trajectory without the need for post-processing. We characterize the DNN as a function of the context size and number of hidden layers, and compare it against two GMM articulatory synthesizers, a baseline model that performs a simple frame-by-frame mapping, and a second model that also performs trajectory optimization. Our results show that a DNN with a 60-ms context window and two 512-neuron hidden layers can synthesize speech at four times the frame rate – comparable to frame-by-frame mappings, while improving the accuracy of trajectory optimization (a 9.8% reduction in Mel Cepstral distortion). Subjective evaluation through pairwise listening tests also shows a strong preference toward the DNN articulatory synthesizer when compared to GMM trajectory optimization.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 36, March 2016, Pages 260–273
نویسندگان
, ,