کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558281 874889 2014 24 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis
ترجمه فارسی عنوان
در اثر برانگیختگی و پارامترهای طیفی برای سنتز گفتاری پارامتریک بیان آماری
کلمات کلیدی
سنتز گفتار، سنتز گفتاری پارامتریک گفتار، سنتز سخنرانی فوری پارامتریک گفتار
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Analysis of speech parameters for expressive statistical speech synthesis.
• Different spectral and excitation features analyzed.
• Spectral parameters have bigger impact on the expressive synthetic speech.

This paper presents a study on the importance of short-term speech parameterizations for expressive statistical parametric synthesis. Assuming a source-filter model of speech production, the analysis is conducted over spectral parameters, here defined as features which represent a minimum-phase synthesis filter, and some excitation parameters, which are features used to construct a signal that is fed to the minimum-phase synthesis filter to generate speech. In the first part, different spectral and excitation parameters that are applicable to statistical parametric synthesis are tested to determine which ones are the most emotion dependent. The analysis is performed through two methods proposed to measure the relative emotion dependency of each feature: one based on K-means clustering, and another based on Gaussian mixture modeling for emotion identification. Two commonly used forms of parameters for the short-term speech spectral envelope, the Mel cepstrum and the Mel line spectrum pairs are utilized. As excitation parameters, the anti-causal cepstrum, the time-smoothed group delay, and band-aperiodicity coefficients are considered. According to the analysis, the line spectral pairs are the most emotion dependent parameters. Among the excitation features, the band-aperiodicity coefficients present the highest correlation with the speaker's emotion. The most emotion dependent parameters according to this analysis were selected to train an expressive statistical parametric synthesizer using a speaker and language factorization framework. Subjective test results indicate that the considered spectral parameters have a bigger impact on the synthesized speech emotion when compared with the excitation ones.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 28, Issue 5, September 2014, Pages 1209–1232
نویسندگان
, ,