Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
568982	876509	2006	22 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

NLP, Natural Language Processing Text-to-speech synthesis - سنتز متن به گفتار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis

چکیده انگلیسی

This work targets to improve the naturalness of synthetic intonational contours in Text-to-Speech synthesis through the provision of prominence, which is a major expression of human speech. Focusing on the tonal dimension of emphasis, we present a robust unit-selection methodology for generating realistic F0 curves in cases where focus prominence is required. The proposed approach is based on selecting Tone-Group units from commonly used prosodic corpora that are automatically transcribed as patterns of syllables. In contrast to related approaches, patterns represent only the most perceivable sections of the sampled curves and are encoded to serve morphologically different sequence of syllables. This results in a minimization of the required amount of units so as to achieve sufficient coverage within the database. Nevertheless, this optimization enables the application of high-quality F0 generation to small-footprint text-to-speech synthesis. For generic F0 selection we query the database based on sequences of ToBI labels, though other intonational frameworks can be used as well. To realize focus prominence on specific Tone-Groups the selection also incorporates a level indicator of emphasis. We set up a series of listening tests by exploiting a database built from a 482-utterance corpus, which featured partially purpose-uttered emphasis. The results showed a clear subjective preference of the proposed model against a linear regression one in 75% of the cases when used in generic synthesis. Furthermore, this model provided ambiguous percept of emphasis in an experiment featuring major and minor degrees of prominence.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 48, Issue 9, September 2006, Pages 1057–1078

نویسندگان

Gerasimos Xydas, Georgios Kouroupetroglou,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis

دسترسی سریع

ارتباط

English Website