کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566032 875912 2008 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis
چکیده انگلیسی

In this paper, we describe a novel approach to the realization of Thai speech synthesis. Spectrum, fundamental frequency (F0), and phone duration are modeled simultaneously in a unified framework of HMM, and their parameter distributions are clustered independently by using a decision-tree based context clustering technique. A group of contextual factors which affect spectrum, F0, and state duration, i.e., tone type, part of speech, are taken into account. Since Thai is a tonal language, not only intelligibility and naturalness but also correctness of synthesized tone is taken into account. To improve the correctness of tone of the synthesized speech, tone groups and tone types are used to design four different structures of decision tree in the tree-based context clustering process, including a single binary tree structure, a simple tone-separated tree structure, a constancy-based-tone-separated tree structure, and a trend-based-tone-separated tree structure. A subjective evaluation of tone correctness is conducted by using tone perception of eight Thai listeners. The simple tone-separated tree structure gives the highest level of tone correctness, while the single binary tree structure gives the lowest level of tone correctness. In addition to the tree structure, the additional contextual tone information which is applied to all structures of the decision tree achieves a significant improvement of tone correctness. Moreover, the evaluation of syllable duration distortion among the four structures shows that the constancy-based-tone-separated and the trend-based-tone-separated tree structures can alleviate the distortions that appear when using the simple tone-separated tree structure. Finally, MOS and CCR tests show that the implemented system gives the better reproduction of prosody (or naturalness, in some sense) than the unit-selection-based system with the same speech database.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 50, Issue 5, May 2008, Pages 392–404
نویسندگان
, ,