دانلود رایگان مقاله: مدل سازی مسیرهای F0 در شبکه های عصبی عمیق ساختاریافته با سلسله مراتبی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
566001	1452024	2016	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Modeling F0 trajectories in hierarchically structured deep neural networks

ترجمه فارسی عنوان

مدل سازی مسیرهای F0 در شبکه های عصبی عمیق ساختاریافته با سلسله مراتبی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

سنتز گفتار؛ مدل مخفی مارکف؛ فرکانس اساسی؛ شبکه عمیق عصبی؛ تبدیل کسینوس گسسته

fundamental frequency - بسامد پایه، فرکانس پایه Discrete cosine transform - تبدیل کسینوس گسسته Speech synthesis - سنتز گفتار Deep neural network - شبکه عصبی عمیق Hidden Markov model - مدل پنهان مارکوف

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

مدل سازی مسیرهای F0 در شبکه های عصبی عمیق ساختاریافته با سلسله مراتبی

چکیده انگلیسی

• We present a F0 modeling method which considers the intrinsic F0 property using deep neural networks (DNN) for statistical parametric speech synthesis.
• The F0 trajectories are parameterized using optimized discrete cosine transform (DCT) analysis to embody the long-term F0 property.
• A group of DNNs are utilized to describe contributions of context features at different prosodic levels to the observed F0 contours considering the additive nature of F0 generation.
• Two structures, cascade and parallel DNNs, are proposed and compared in our experiments.

This paper investigates F0 modeling of speech in deep neural networks (DNN) for statistical parametric speech synthesis (SPSS). Recently, DNN has been applied to the acoustic modeling of SPSS and has shown good performance in characterizing complex dependencies between contextual features and acoustic observations. However, the additive nature and long-term suprasegmental property of F0 features have not been fully exploited in the existing DNN-based SPSS. Two different model structures, cascade DNN and parallel DNN are proposed to embody the hierarchical and additive properties of the F0 in DNN-based prosody modeling. In the cascade structure, the DNN-predicted F0 contours of higher levels are used as input to the DNN of the current level. In the parallel structure, F0 components corresponding to different prosody levels are separately generated by DNNs and added together to obtain the final F0 contour. An optimized discrete cosine transform (DCT) is used to extract long-term F0 features at syllable, word, and phrase levels. The experimental results show that our approach yields better subjective performance than either the conventional HMM or DNN approaches. Among all compared systems, the parallel DNN achieves the best objective and subjective performance.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 76, February 2016, Pages 82–92

نویسندگان

Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : مدل سازی مسیرهای F0 در شبکه های عصبی عمیق ساختاریافته با سلسله مراتبی

دسترسی سریع

ارتباط

English Website