کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
528408 869566 2014 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Non-manual grammatical marker recognition based on multi-scale, spatio-temporal analysis of head pose and facial expressions
ترجمه فارسی عنوان
تشخیص غیر رسمی نشانگر دستخط بر مبنای تجزیه و تحلیل فضایی و زمانی مقدماتی چند منظوره و بیان چهره یک است؟
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• Eyebrow gestures and periodic head movements convey linguistic information.
• We propose multi-scale approach for recognizing non-manual grammatical marker.
• We obtain high-level features from spatio-temporal analysis of non-manual events.
• We use two-stage CRFs to recognize events and partition them into phases.
• Experiments demonstrate superior improvements in ASL.

Changes in eyebrow configuration, in conjunction with other facial expressions and head gestures, are used to signal essential grammatical information in signed languages. This paper proposes an automatic recognition system for non-manual grammatical markers in American Sign Language (ASL) based on a multi-scale, spatio-temporal analysis of head pose and facial expressions. The analysis takes account of gestural components of these markers, such as raised or lowered eyebrows and different types of periodic head movements. To advance the state of the art in non-manual grammatical marker recognition, we propose a novel multi-scale learning approach that exploits spatio-temporally low-level and high-level facial features. Low-level features are based on information about facial geometry and appearance, as well as head pose, and are obtained through accurate 3D deformable model-based face tracking. High-level features are based on the identification of gestural events, of varying duration, that constitute the components of linguistic non-manual markers. Specifically, we recognize events such as raised and lowered eyebrows, head nods, and head shakes. We also partition these events into temporal phases. We separate the anticipatory transitional movement (the onset) from the linguistically significant portion of the event, and we further separate the core of the event from the transitional movement that occurs as the articulators return to the neutral position towards the end of the event (the offset). This partitioning is essential for the temporally accurate localization of the grammatical markers, which could not be achieved at this level of precision with previous computer vision methods. In addition, we analyze and use the motion patterns of these non-manual events. Those patterns, together with the information about the type of event and its temporal phases, are defined as the high-level features. Using this multi-scale, spatio-temporal combination of low- and high-level features, we employ learning methods for accurate recognition of non-manual grammatical markers in ASL sentences.

Figure optionsDownload high-quality image (329 K)Download as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Image and Vision Computing - Volume 32, Issue 10, October 2014, Pages 671–681
نویسندگان
, , , , , , ,