کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565911 1452041 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Compensating for speaker or lexical variabilities in speech for emotion recognition
ترجمه فارسی عنوان
جبران خسارت برای سخنران یا تنوع واژگان در سخنرانی برای شناخت احساسات
کلمات کلیدی
شناخت احساسی، تجزیه و تحلیل فاکتور، عادی سازی ویژگی، متغیر بلندگو
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• An exhaustive analysis of various acoustic features to identify their dependencies on lexical, speaker and emotional aspects.
• A novel trajectory-based normalization scheme to compensate for speaker or lexical variabilities.
• Classification experiments that validate the proposed normalization approach on the IEMOCAP database.

Affect recognition is a crucial requirement for future human machine interfaces to effectively respond to nonverbal behaviors of the user. Speech emotion recognition systems analyze acoustic features to deduce the speaker’s emotional state. However, human voice conveys a mixture of information including speaker, lexical, cultural, physiological and emotional traits. The presence of these communication aspects introduces variabilities that affect the performance of an emotion recognition system. Therefore, building robust emotional models requires careful considerations to compensate for the effect of these variabilities. This study aims to factorize speaker characteristics, verbal content and expressive behaviors in various acoustic features. The factorization technique consists in building phoneme level trajectory models for the features. We propose a metric to quantify the dependency between acoustic features and communication traits (i.e., speaker, lexical and emotional factors). This metric, which is motivated by the mutual information framework, estimates the uncertainty reduction in the trajectory models when a given trait is considered. The analysis provides important insights on the dependency between the features and the aforementioned factors. Motivated by these results, we propose a feature normalization technique based on the whitening transformation that aims to compensate for speaker and lexical variabilities. The benefit of employing this normalization scheme is validated with the presented factor analysis method. The emotion recognition experiments show that the normalization approach can attenuate the variability imposed by the verbal content and speaker identity, yielding 4.1% and 2.4% relative performance improvements on a selected set of features, respectively.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 57, February 2014, Pages 1–12
نویسندگان
, ,