Anger recognition in speech using acoustic and linguistic cues

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
565969	1452044	2011	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Emotion detection - تشخیص عاطفه Decision fusion - فیوژن تصمیم گیری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Anger recognition in speech using acoustic and linguistic cues

چکیده انگلیسی

The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification. In terms of acoustic modeling we generate statistics from acoustic audio descriptors, e.g. pitch, loudness, spectral characteristics. Ranking our features we see that loudness and MFCC seem most promising for all databases. For the English database also pitch features are important. In terms of linguistic modeling we apply probabilistic and entropy-based models of words and phrases, e.g. Bag-of-Words (BOW), Term Frequency (TF), Term Frequency – Inverse Document Frequency (TF.IDF) and the Self-Referential Information (SRI). SRI clearly outperforms vector space models. Modeling phrases slightly improves the scores. After classification of both acoustic and linguistic information on separated levels we fuse information on decision level adding confidences. We compare the obtained scores on three different databases. Two databases are taken from the IVR customer care domain, another database accounts for a WoZ data collection. All corpora are of realistic speech condition. We observe promising results for the IVR databases while the WoZ database shows lower scores overall. In order to provide comparability between the results we evaluate classification success using the f1 measurement in addition to overall accuracy figures. As a result, acoustic modeling clearly outperforms linguistic modeling. Fusion slightly improves overall scores. With a baseline of approximately 60% accuracy and .40 f1-measurement by constant majority class voting we obtain an accuracy of 75% with respective .70 f1 for the WoZ database. For the IVR databases we obtain approximately 79% accuracy with respective .78 f1 over a baseline of 60% accuracy with respective .38 f1.

► We recognize angry speech from three realistic corpora in English and German.
► Large scale acoustic features are ranked and fused with word- and sequence models.
► We compare features and results for IVR and WoZ databases, adding confidences.
► Acoustic models outperform linguistics for all databases using various algorithms.
► Loudness, MFCC and pitch features are promising while BOW and TF.IDF features fail.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 53, Issues 9–10, November–December 2011, Pages 1198–1209

نویسندگان

Tim Polzehl, Alexander Schmitt, Florian Metze, Michael Wagner,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Anger recognition in speech using acoustic and linguistic cues

دسترسی سریع

ارتباط

English Website