Robust visual speakingness detection using bi-level HMM

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
533412	870113	2012	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

Robust visual speakingness detection using bi-level HMM

چکیده انگلیسی

Visual voice activity detection (V-VAD) plays an important role in both HCI and HRI, affecting both the conversation strategy and sync between humans and robots/computers. The typical speakingness decision of V-VAD consists of post-processing for signal smoothing and classification using thresholding. Several parameters, ensuring a good trade-off between hit rate and false alarm, are usually heuristically defined. This makes the V-VAD approaches vulnerable to noisy observation and changes of environment conditions, resulting in poor performance and robustness to undesired frequent speaking state changes. To overcome those difficulties, this paper proposes a new probabilistic approach, naming bi-level HMM and analyzing lip activity energy for V-VAD in HRI. The designing idea is based on lip movement and speaking assumptions, embracing two essential procedures into a single model. A bi-level HMM is an HMM with two state variables in different levels, where state occurrence in a lower level conditionally depends on the state in an upper level. The approach works online with low-resolution image and in various lighting conditions, and has been successfully tested in 21 image sequences (22,927 frames). It achieved over 90% of probabilities of detection, in which it brought improvements of almost 20% compared to four other V-VAD approaches.

► Typical V-VAD is vulnerable to noisy observation and differences in illumination.
► Poor robustness to undesired frequent speaking state changes is often resulted.
► We examine lip movement distributions during non-speaking and speaking sequences.
► We propose a probabilistic model, bi-level HMM, to analyze lip activity for V-VAD.
► Bi-level HMM embraces a post-processing and a classification into a single model.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 45, Issue 2, February 2012, Pages 783–793

نویسندگان

P. Tiawongsombat, Mun-Ho Jeong, Joo-Seop Yun, Bum-Jae You, Sang-Rok Oh,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Robust visual speakingness detection using bi-level HMM

دسترسی سریع

ارتباط

English Website