LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
526902	869257	2013	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Emotion recognition - تشخیص احساسات Long short-term memory - حافظه طولانی مدت کوتاه Context modeling - مدل سازی زمینه

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

چکیده انگلیسی

Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is currently one of the most challenging tasks in the field of affective computing. This article presents our recent advances in assessing dimensional representations of emotion, such as arousal, expectation, power, and valence, in an audiovisual human–computer interaction scenario. Building on previous studies which demonstrate that long-range context modeling tends to increase accuracies of emotion recognition, we propose a fully automatic audiovisual recognition approach based on Long Short-Term Memory (LSTM) modeling of word-level audio and video features. LSTM networks are able to incorporate knowledge about how emotions typically evolve over time so that the inferred emotion estimates are produced under consideration of an optimal amount of context. Extensive evaluations on the Audiovisual Sub-Challenge of the 2011 Audio/Visual Emotion Challenge show how acoustic, linguistic, and visual features contribute to the recognition of different affective dimensions as annotated in the SEMAINE database. We apply the same acoustic features as used in the challenge baseline system whereas visual features are computed via a novel facial movement feature extractor. Comparing our results with the recognition scores of all Audiovisual Sub-Challenge participants, we find that the proposed LSTM-based technique leads to the best average recognition performance that has been reported for this task so far.

Figure optionsDownload high-quality image (306 K)Download as PowerPoint slideHighlights
► Our emotion recognizer uses acoustic, linguistic, and visual information.
► We model the temporal continuity of affect via a suited machine learning technique.
► We built a system able to recognize arousal, expectation, power, and valence.
► It achieves the best accuracies reported so far for the 2011 AVEC Challenge.
► On average, we obtain a weighted accuracy of 65.2%.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Image and Vision Computing - Volume 31, Issue 2, February 2013, Pages 153–163

نویسندگان

Martin Wöllmer, Moritz Kaiser, Florian Eyben, Björn Schuller, Gerhard Rigoll,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

دسترسی سریع

ارتباط

English Website