Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
567489	876085	2012	18 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Disfluency - اختلال Perceptual evaluation - ارزیابی ادراکی Speech synthesis - سنتز گفتار Prosody - پرونده Conversational speech - گفتار مکالمه

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence

چکیده انگلیسی

Until now, speech synthesis has mainly involved reading-style speech. Today, however, text-to-speech systems must provide a variety of styles because users expect these interfaces to do more than just read information. If synthetic voices must be integrated into future technology, they must simulate the way people talk instead of the way people read. Existing knowledge about how disfluencies occur has made it possible to propose a general framework for synthesising disfluencies. We propose a model based on the definition of disfluency and the concept of underlying fluent sentences. The model incorporates the parameters of standard prosodic models for fluent speech with local modifications of prosodic parameters near the interruption point. The constituents of the local models for filled pauses are derived from the analysis corpus, and constituent’s prosodic parameters are predicted via linear regression analysis. We also discuss the implementation details of the model when used in a real speech synthesis system. Objective and perceptual evaluations showed that the proposed models outperformed the baseline model. Perceptual evaluations of the system showed that it is possible to synthesise filled pauses without decreasing the overall naturalness of the system, and users stated that the speech produced is even more natural than the one produced without filled pauses.

Figure optionsDownload as PowerPoint slideHighlights
► We propose a method for generating filled pauses in unit-selection speech synthesis.
► The underlying fluent sentence model is a general framework for the prosodic modelling of disfluencies.
► Linear regression analysis can model local prosodic variation in the case of filled pauses.
► We detail the architectural concerns related to the conversational module of the Ogmios speech synthesis system.
► Perceptual tests validated the method for isolated sentences as well as for a pragmatic scenario.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 54, Issue 3, March 2012, Pages 459–476

نویسندگان

Jordi Adell, David Escudero, Antonio Bonafonte,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence

دسترسی سریع

ارتباط

English Website