Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558317	874897	2013	22 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Prediction accuracy - دقت پیش بینی Text-to-speech synthesis - سنتز متن به گفتار Feedforward neural networks - شبکه های عصبی Feedforward Tilt - شیب Phonological - طنز Contextual - متنی Production constraints - محدودیت تولید Positional - موقعیتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

چکیده انگلیسی

This paper proposes a two-stage feedforward neural network (FFNN) based approach for modeling fundamental frequency (F0) values of a sequence of syllables. In this study, (i) linguistic constraints represented by positional, contextual and phonological features, (ii) production constraints represented by articulatory features and (iii) linguistic relevance tilt parameters are proposed for predicting intonation patterns. In the first stage, tilt parameters are predicted using linguistic and production constraints. In the second stage, F0 values of the syllables are predicted using the tilt parameters predicted from the first stage, and basic linguistic and production constraints. The prediction performance of the neural network models is evaluated using objective measures such as average prediction error (μ), standard deviation (σ) and linear correlation coefficient (γX,Y). The prediction accuracy of the proposed two-stage FFNN model is compared with other statistical models such as Classification and Regression Tree (CART) and Linear Regression (LR) models. The prediction accuracy of the intonation models is also analyzed by conducting listening tests to evaluate the quality of synthesized speech obtained after incorporation of intonation models into the baseline system. From the evaluation, it is observed that prediction accuracy is better for two-stage FFNN models, compared to the other models.

► Positional, contextual, phonological and articulatory features extracted from the text are proposed for modeling the intonation patterns.
► Tilt parameters derived from the above features are proposed for improving the accuracy of prediction.
► Single-stage and two-stage neural network models are proposed for capturing the intonation patterns from the proposed features.
► Impact of individual features in predicting the intonation patterns is analyzed.
► Explored the influence of duration and intensity constraints on predicting the intonation patterns.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 5, August 2013, Pages 1105–1126

نویسندگان

V. Ramu Reddy, K. Sreenivasa Rao,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

دسترسی سریع

ارتباط

English Website