کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4973739 1451685 2017 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Generation of creaky voice for improving the quality of HMM-based speech synthesis
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Generation of creaky voice for improving the quality of HMM-based speech synthesis
چکیده انگلیسی
This paper aims at developing an HMM-based speech synthesis system capable of generating creaky voice in addition to modal voice. Generation of creaky voice is carried out by addressing two main issues, namely, an automatic prediction of creaky voice and appropriate modelling of the excitation signal of creaky voice. An automatic creaky voice detection method is proposed based on the analysis of variation of epoch parameters for different voicing regions. A neural network classifier is trained using the variances of epoch parameters for detection of creaky regions. A hybrid source model which is an extension of recently developed time-domain deterministic plus noise model is proposed for modelling creaky excitation signal. In the proposed hybrid source model, the pitch-synchronous analysis is performed on the creaky excitation signal of every phone. From the creaky residual frames of every phonetic class, the deterministic and noise components are estimated. The creaky deterministic components of all phonetic classes are stored in the database. The noise components are parameterized in terms of spectral and amplitude envelopes and are modelled by HMMs. During synthesis, the appropriate deterministic component is selected from the database, and the noise component is constructed from the parameters generated from HMMs. The creaky deterministic and noise components are pitch-synchronously overlap-added to produce the creaky excitation signal. Subjective evaluation results indicate that the incorporation of creaky voice has improved the naturalness of the synthetic speech of two male speakers, and the quality is slightly better than the basic time-domain deterministic plus noise model meant for only modal excitation.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 42, March 2017, Pages 38-58
نویسندگان
, ,