کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6960540 1452001 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
GPR-based Thai speech synthesis using multi-level duration prediction
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
GPR-based Thai speech synthesis using multi-level duration prediction
چکیده انگلیسی
This paper proposes a multi-level Gaussian process regression (GPR)-based method for duration prediction by incorporating phone- and syllable-level duration models. In this method, we first train the syllable model and predict syllable durations for a given input of context labels. Then, we use the predicted syllable duration as an additional context for the phone-level model to predict phone durations. To apply multi-level duration prediction to the GPR-based speech synthesis framework, we designed phone- and syllable- level context sets for Thai that include linguistic information and the relative positions of speech units. We also examined the multi-level deep neural network (DNN)-based duration-prediction method by using the same approach as for the proposed multi-level GPR-based one. We conducted objective and subjective evaluations using two-hour training data to compare the proposed method with single-level ones. The results indicate that the proposed multi-level duration-prediction method outperformed single-level ones in DNN-, and GPR-based frameworks. They also indicate that the proposed multi-level GPR-based method can provide better performance than the multi-level HMM-based duration-prediction method.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 99, May 2018, Pages 114-123
نویسندگان
, , ,