کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566001 1452024 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Modeling F0 trajectories in hierarchically structured deep neural networks
ترجمه فارسی عنوان
مدل سازی مسیرهای F0 در شبکه های عصبی عمیق ساختاریافته با سلسله مراتبی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We present a F0 modeling method which considers the intrinsic F0 property using deep neural networks (DNN) for statistical parametric speech synthesis.
• The F0 trajectories are parameterized using optimized discrete cosine transform (DCT) analysis to embody the long-term F0 property.
• A group of DNNs are utilized to describe contributions of context features at different prosodic levels to the observed F0 contours considering the additive nature of F0 generation.
• Two structures, cascade and parallel DNNs, are proposed and compared in our experiments.

This paper investigates F0 modeling of speech in deep neural networks (DNN) for statistical parametric speech synthesis (SPSS). Recently, DNN has been applied to the acoustic modeling of SPSS and has shown good performance in characterizing complex dependencies between contextual features and acoustic observations. However, the additive nature and long-term suprasegmental property of F0 features have not been fully exploited in the existing DNN-based SPSS. Two different model structures, cascade DNN and parallel DNN are proposed to embody the hierarchical and additive properties of the F0 in DNN-based prosody modeling. In the cascade structure, the DNN-predicted F0 contours of higher levels are used as input to the DNN of the current level. In the parallel structure, F0 components corresponding to different prosody levels are separately generated by DNNs and added together to obtain the final F0 contour. An optimized discrete cosine transform (DCT) is used to extract long-term F0 features at syllable, word, and phrase levels. The experimental results show that our approach yields better subjective performance than either the conventional HMM or DNN approaches. Among all compared systems, the parallel DNN achieves the best objective and subjective performance.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 76, February 2016, Pages 82–92
نویسندگان
, , , , , , ,