کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
558279 | 874889 | 2014 | 25 صفحه PDF | دانلود رایگان |
• Two adaptation variants of the R waveshape parameter regression are presented.
• We improve glottal source shape parameter estimation by applying Viterbi smoothing.
• Additionally, we propose a novel method to steer Viterbi smoothing.
• Viterbi steering is based on exploiting the covariation of other voice descriptors.
• We evaluate in detail on the 3 CMU Arctic databases BDL, JMK and SLT.
This paper summarizes the results of our investigations into estimating the shape of the glottal excitation source from speech signals. We employ the Liljencrants–Fant (LF) model describing the glottal flow and its derivative. The one-dimensional glottal source shape parameter Rd describes the transition in voice quality from a tense to a breathy voice. The parameter Rd has been derived from a statistical regression of the R waveshape parameters which parameterize the LF model. First, we introduce a variant of our recently proposed adaptation and range extension of the Rd parameter regression. Secondly, we discuss in detail the aspects of estimating the glottal source shape parameter Rd using the phase minimization paradigm. Based on the analysis of a large number of speech signals we describe the major conditions that are likely to result in erroneous Rd estimates. Based on these findings we investigate into means to increase the robustness of the Rd parameter estimation. We use Viterbi smoothing to suppress unnatural jumps of the estimated Rd parameter contours within short time segments. Additionally, we propose to steer the Viterbi algorithm by exploiting the covariation of other voice descriptors to improve Viterbi smoothing. The novel Viterbi steering is based on a Gaussian Mixture Model (GMM) that represents the joint density of the voice descriptors and the Open Quotient (OQ) estimated from corresponding electroglottographic (EGG) signals. A conversion function derived from the mixture model predicts OQ from the voice descriptors. Converted to Rd it defines an additional prior probability to adapt the partial probabilities of the Viterbi algorithm accordingly. Finally, we evaluate the performances of the phase minimization based methods using both variants to adapt and extent the Rd regression on one synthetic test set as well as in combination with Viterbi smoothing and each variant of the novel Viterbi steering on one test set of natural speech. The experimental findings exhibit improvements for both Viterbi approaches.
Journal: Computer Speech & Language - Volume 28, Issue 5, September 2014, Pages 1170–1194