Recognizing affect from speech prosody using hierarchical graphical models

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
565961	1452044	2011	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Affective speech Paralinguistics Graphical models - مدل های گرافیکی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Recognizing affect from speech prosody using hierarchical graphical models

چکیده انگلیسی

In this work we develop and apply a class of hierarchical directed graphical models on the task of recognizing affective categories from prosody in both acted and natural speech. A strength of this new approach is the integration and summarization of information using both local (e.g., syllable level) and global prosodic phenomena (e.g., utterance level). In this framework speech is structurally modeled as a dynamically evolving hierarchical model in which levels of the hierarchy are determined by prosodic constituency and contain parameters that evolve according to dynamical systems. The acoustic parameters have been chosen to reflect four main components of speech thought to reflect paralinguistic and affect-specific information: intonation, loudness, rhythm and voice quality. The work is first evaluated on a database of acted emotions and compared to human perceptual recognition of five affective categories where it achieves rates within nearly 10% of human recognition accuracy despite only focusing on prosody. The model is then evaluated on two different corpora of fully spontaneous, affectively-colored, naturally occurring speech between people: Call Home English and BT Call Center. Here the ground truth labels are obtained from examining the agreement of 29 human coders labeling arousal and valence. The best discrimination performance on the natural spontaneous speech, using only the prosody features, obtains a 70% detection rate with 30% false alarms when detecting high arousal negative valence speech in call centers.

► We introduce a machine learning framework for modeling hierarchical phenomena.
► This framework can be used to predict affective prosodic changes from speech.
► Performance on labeling acted emotions falls within 10% of human accuracy figures.
► On spontaneous speech, the model is shown to achieve detection of 70% of negative valence episodes.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 53, Issues 9–10, November–December 2011, Pages 1088–1103

نویسندگان

Raul Fernandez, Rosalind Picard,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Recognizing affect from speech prosody using hierarchical graphical models

دسترسی سریع

ارتباط

English Website