کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
527042 869276 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A review of recent advances in visual speech decoding
ترجمه فارسی عنوان
بررسی پیشرفت های اخیر در گفتار بصری رمزگشایی یک ؟؟
کلمات کلیدی
رمزگشایی گفتار بصری، شناسایی خودکار گفتار، خواندن لب، تشخیص گفتار صوتی و تصویری، مرور
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• A detailed review of the recent advances in the area of visual speech decoding.
• Visual features tackling speaker dependency, head poses and temporal information.
• Dynamic audio-visual speech information fusion.
• Recent techniques of facial landmark localization.
• Summary of audio-visual speech databases and ASR performance on them.

Visual speech information plays an important role in automatic speech recognition (ASR) especially when audio is corrupted or even inaccessible. Despite the success of audio-based ASR, the problem of visual speech decoding remains widely open. This paper provides a detailed review of recent advances in this research area. In comparison with the previous survey [97] which covers the whole ASR system that uses visual speech information, we focus on the important questions asked by researchers and summarize the recent studies that attempt to answer them. In particular, there are three questions related to the extraction of visual features, concerning speaker dependency, pose variation and temporal information, respectively. Another question is about audio-visual speech fusion, considering the dynamic changes of modality reliabilities encountered in practice. In addition, the state-of-the-art on facial landmark localization is briefly introduced in this paper. Those advanced techniques can be used to improve the region-of-interest detection, but have been largely ignored when building a visual-based ASR system. We also provide details of audio-visual speech databases. Finally, we discuss the remaining challenges and offer our insights into the future research on visual speech decoding.

Figure optionsDownload high-quality image (147 K)Download as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Image and Vision Computing - Volume 32, Issue 9, September 2014, Pages 590–605
نویسندگان
, , , ,