کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
566001 | 1452024 | 2016 | 11 صفحه PDF | دانلود رایگان |
• We present a F0 modeling method which considers the intrinsic F0 property using deep neural networks (DNN) for statistical parametric speech synthesis.
• The F0 trajectories are parameterized using optimized discrete cosine transform (DCT) analysis to embody the long-term F0 property.
• A group of DNNs are utilized to describe contributions of context features at different prosodic levels to the observed F0 contours considering the additive nature of F0 generation.
• Two structures, cascade and parallel DNNs, are proposed and compared in our experiments.
This paper investigates F0 modeling of speech in deep neural networks (DNN) for statistical parametric speech synthesis (SPSS). Recently, DNN has been applied to the acoustic modeling of SPSS and has shown good performance in characterizing complex dependencies between contextual features and acoustic observations. However, the additive nature and long-term suprasegmental property of F0 features have not been fully exploited in the existing DNN-based SPSS. Two different model structures, cascade DNN and parallel DNN are proposed to embody the hierarchical and additive properties of the F0 in DNN-based prosody modeling. In the cascade structure, the DNN-predicted F0 contours of higher levels are used as input to the DNN of the current level. In the parallel structure, F0 components corresponding to different prosody levels are separately generated by DNNs and added together to obtain the final F0 contour. An optimized discrete cosine transform (DCT) is used to extract long-term F0 features at syllable, word, and phrase levels. The experimental results show that our approach yields better subjective performance than either the conventional HMM or DNN approaches. Among all compared systems, the parallel DNN achieves the best objective and subjective performance.
Journal: Speech Communication - Volume 76, February 2016, Pages 82–92