دانلود رایگان مقاله: قابل فهم بودن گفتار مصنوعی فشرده شده با زمان: روش فشرده سازی و سبک گفتار

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
566718	1452026	2015	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Intelligibility of time-compressed synthetic speech: Compression method and speaking style

ترجمه فارسی عنوان

قابل فهم بودن گفتار مصنوعی فشرده شده با زمان: روش فشرده سازی و سبک گفتار

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

HMM-based speech synthesis Fast speech Time-compression

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

قابل فهم بودن گفتار مصنوعی فشرده شده با زمان: روش فشرده سازی و سبک گفتار

چکیده انگلیسی

• Analysis of listeners’ intelligibility of natural and synthetic time-compressed speech.
• Different compression methods are applied to normal and fast speech.
• We evaluated a linear method and two non linear methods that act on the duration model.
• The linear method outperforms the others, particularly for ultra-fast (3xs) rates.
• The gain from using fast data is dependent on how intelligible it is.

We present a series of intelligibility experiments performed on natural and synthetic speech time-compressed at a range of rates and analyze the effect of speech corpus and compression method on the intelligibility scores of sighted and blind individuals. Particularly we are interested in comparing linear and non-linear compression methods applied to normal and fast speech of different speakers. We recorded English and German language voice talents reading prompts at a normal and a fast rate. To create synthetic voices we trained a statistical parametric speech synthesis system based on the normal and the fast data of each speaker. We compared three compression methods: scaling the variance of the state duration model, interpolating the duration models of the fast and the normal voices, and applying a linear compression method to the generated speech waveform. Word recognition results for the English voices show that generating speech at a normal speaking rate and then applying linear compression resulted in the most intelligible speech at all tested rates. A similar result was found when evaluating the intelligibility of the natural speech corpus. For the German voices, interpolation was found to be better at moderate speaking rates but the linear method was again more successful at very high rates, particularly when applied to the fast data. Phonemic level annotation of the normal and fast databases showed that the German speaker was able to reproduce speech at a fast rate with fewer deletion and substitution errors compared to the English speaker, supporting the intelligibility benefits observed when compressing his fast speech. This shows that the use of fast speech data to create faster synthetic voices does not necessarily lead to more intelligible voices as results are highly dependent on how successful the speaker was at speaking fast while maintaining intelligibility. Linear compression applied to normal rate speech can more reliably provide higher intelligibility, particularly at ultra fast rates.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 74, November 2015, Pages 52–64

نویسندگان

Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : قابل فهم بودن گفتار مصنوعی فشرده شده با زمان: روش فشرده سازی و سبک گفتار

دسترسی سریع

ارتباط

English Website