کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566669 1452019 2016 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis
ترجمه فارسی عنوان
ادراک فاز تحریک گلوتال و ارتباط آن در سنتز گفتار پارامتری آماری
کلمات کلیدی
سنتز گفتار آماری پارامتری; مرحله درک; تحریک جریان های گلوتال;
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Phase perception of the glottal excitation is studied.
• Source-filter vocoder is used to modify pitch-synchronous excitation phase pattern.
• Natural-phase, zero-phase, and random-phase excitations are compared.
• Various speakers and speaking styles are utilized in subjective listening tests.
• Results show that using natural phase information results in improved speech quality.

While the characteristics of the amplitude spectrum of the voiced excitation have been studied widely both in natural and synthetic speech, the role of the excitation phase has remained less explored. This contradicts findings observed in sound perception studies indicating that humans are not phase deaf. Especially in speech synthesis, phase information is often omitted for simplicity. This study investigates the impact of phase information of the excitation signal of voiced speech and its relevance in statistical parametric speech synthesis. The experiments in the study involve, firstly, converting the pitch-synchronously computed original phase spectra of the excitation waveforms (either glottal flow waveforms or residuals) to either zero phase, cyclostationary random phase, or random phase. Secondly, the quality of synthetic speech in each case is compared in subjective listening tests to the corresponding signal excited with the original, natural phase. Experiments are conducted with natural, vocoded, and synthetic speech using voice material from various speakers with varying speaking styles, such as breathy, normal, and Lombard speech. The results indicate that the phase spectrum of the voiced excitation has a perceptually relevant effect in natural, vocoded, and synthetic speech, and utilizing the phase information in speech synthesis leads to improved speech quality.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 81, July 2016, Pages 104–119
نویسندگان
, , , , ,