Time-domain deterministic plus noise model based hybrid source modeling for statistical parametric speech synthesis

Article ID	Journal	Published Year	Pages	File Type
6961013	Speech Communication	2016	19 Pages	PDF

Abstract

This paper proposes a time-domain deterministic plus noise model based hybrid source modeling framework for improving the quality of statistical parametric speech synthesis system. In the proposed approach, the excitation signal is modeled as a combination of deterministic and noise components. Time-domain pitch-synchronous analysis is performed on the excitation or residual signal. From the pitch-synchronous residual frames of a phone, the deterministic and noise components are estimated. The deterministic components of all phones are systematically arranged in the form of a decision tree. The spectrum and amplitude envelope of noise components are modeled using hidden Markov models (HMMs). During synthesis, the suitable deterministic component is chosen from the leaf of a decision tree. The noise component is obtained after imposing the target spectrum and amplitude envelopes generated from the HMMs. The sum of deterministic and noise components are pitch-synchronously overlap added to construct the excitation signal of a phone. The proposed hybrid source modeling approach is incorporated in the statistical parametric speech synthesis system. Performance evaluation results show that the proposed method is capable producing natural sounding synthetic speech and the quality is clearly better than the state-of-the-art statistical parametric speech synthesis systems. Synthesized speech samples of the proposed and the state-of-the-art methods used for the comparison are made available online at http://www.sit.iitkgp.ernet.in/~ksrao/HSM-SPSS/hsm.html.

Keywords

Statistical parametric speech synthesis