کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566718 1452026 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Intelligibility of time-compressed synthetic speech: Compression method and speaking style
ترجمه فارسی عنوان
قابل فهم بودن گفتار مصنوعی فشرده شده با زمان: روش فشرده سازی و سبک گفتار
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Analysis of listeners’ intelligibility of natural and synthetic time-compressed speech.
• Different compression methods are applied to normal and fast speech.
• We evaluated a linear method and two non linear methods that act on the duration model.
• The linear method outperforms the others, particularly for ultra-fast (3xs) rates.
• The gain from using fast data is dependent on how intelligible it is.

We present a series of intelligibility experiments performed on natural and synthetic speech time-compressed at a range of rates and analyze the effect of speech corpus and compression method on the intelligibility scores of sighted and blind individuals. Particularly we are interested in comparing linear and non-linear compression methods applied to normal and fast speech of different speakers. We recorded English and German language voice talents reading prompts at a normal and a fast rate. To create synthetic voices we trained a statistical parametric speech synthesis system based on the normal and the fast data of each speaker. We compared three compression methods: scaling the variance of the state duration model, interpolating the duration models of the fast and the normal voices, and applying a linear compression method to the generated speech waveform. Word recognition results for the English voices show that generating speech at a normal speaking rate and then applying linear compression resulted in the most intelligible speech at all tested rates. A similar result was found when evaluating the intelligibility of the natural speech corpus. For the German voices, interpolation was found to be better at moderate speaking rates but the linear method was again more successful at very high rates, particularly when applied to the fast data. Phonemic level annotation of the normal and fast databases showed that the German speaker was able to reproduce speech at a fast rate with fewer deletion and substitution errors compared to the English speaker, supporting the intelligibility benefits observed when compressing his fast speech. This shows that the use of fast speech data to create faster synthetic voices does not necessarily lead to more intelligible voices as results are highly dependent on how successful the speaker was at speaking fast while maintaining intelligibility. Linear compression applied to normal rate speech can more reliably provide higher intelligibility, particularly at ultra fast rates.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 74, November 2015, Pages 52–64
نویسندگان
, , , , ,