کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6960878 | 1452005 | 2017 | 21 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Meaningful head movements driven by emotional synthetic speech
ترجمه فارسی عنوان
حرکات معنادار سر و گردن از طریق گفتار مصنوعی هیجانی است
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
عوامل مکالمه، حرکت سر به معنی، انیمیشن سخنرانی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
پردازش سیگنال
چکیده انگلیسی
Speech-driven head movement methods are motivated by the strong coupling that exists between head movements and speech, providing an appealing solution to create behaviors that are timely synchronized with speech. This paper offers solutions for two of the problems associated with these methods. First, speech-driven methods require all the potential utterances of the conversational agent (CA) to be recorded, which limits their applications. Using existing text to speech (TTS) systems scales the applications of these methods by providing the flexibility of using text instead of pre-recorded speech. However, simply training speech-driven models with natural speech, and testing them with synthetic speech creates a mismatch affecting the performance of the system. This paper proposes a novel strategy to solve this mismatch. The proposed approach starts by creating a parallel corpus either with neutral or emotional synthetic speech timely aligned with the original speech for which we have the motion capture recordings. This parallel corpus is used to retrain the models from scratch, or adapt the models originally built with natural speech. Both subjective and objective evaluations show the effectiveness of this solution in reducing the mismatch. Second, creating head movement with speech-driven methods can disregard the meaning of the message, even when the movements are perfectly synchronized with speech. The trajectory of head movements in conversations also has a role in conveying meaning (e.g. head nods for acknowledgment). In fact, our analysis reveals that head movements under different discourse functions have distinguishable patterns. Building on the best models driven by synthetic speech, we propose to extract dialog acts directly from the text and use this information to directly constrain our models. Compared to the unconstrained model, the model generates head motion sequences that not only are closer to the statistical patterns of the original head movements, but also are perceived as more natural and appropriate.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 95, December 2017, Pages 87-99
Journal: Speech Communication - Volume 95, December 2017, Pages 87-99
نویسندگان
Najmeh Sadoughi, Yang Liu, Carlos Busso,