کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
567128 1452043 2013 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis
چکیده انگلیسی

The use of visemes as atomic speech units in visual speech analysis and synthesis systems is well-established. Viseme labels are determined using a many-to-one phoneme-to-viseme mapping. However, due to visual coarticulation effects, an accurate mapping from phonemes to visemes should define a many-to-many mapping scheme instead. In this research it was found that neither the use of standardized nor speaker-dependent many-to-one viseme labels could satisfy the quality requirements of concatenative visual speech synthesis. Therefore, a novel technique to define a many-to-many phoneme-to-viseme mapping scheme is introduced, which makes use of both tree-based and k-means clustering approaches. We show that these many-to-many viseme labels more accurately describe the visual speech information as compared to both phoneme-based and many-to-one viseme-based speech labels. In addition, we found that the use of these many-to-many visemes improves the precision of the segment selection phase in concatenative visual speech synthesis using limited speech databases. Furthermore, the resulting synthetic visual speech was both objectively and subjectively found to be of higher quality when the many-to-many visemes are used to describe the speech database and the synthesis targets.


► The use of many-to-one phoneme-to-viseme mappings for visual speech synthesis was evaluated.
► These many-to-one mappings are unable to accurately describe the visual speech information.
► Novel many-to-many phoneme-to-viseme mapping schemes were constructed.
► These are able to describe the visual speech information more accurately than other approaches.
► They improve the quality of concatenative visual speech synthesis using limited databases.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 55, Issues 7–8, September 2013, Pages 857–876
نویسندگان
, , ,