Voice activity detection and speaker localization using audiovisual cues

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
535856	870396	2012	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Multimodal analysis - تجزیه و تحلیل چندجمله ای Voice activity detection - تشخیص فعالیت صوتی User Interfaces - رابط کاربر Speaker localization - محلی سازی بلندگو Hidden Markov models - مدل پنهان مارکوف

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

Voice activity detection and speaker localization using audiovisual cues

چکیده انگلیسی

This paper proposes a multimodal approach to distinguish silence from speech situations, and to identify the location of the active speaker in the latter case. In our approach, a video camera is used to track the faces of the participants, and a microphone array is used to estimate the Sound Source Location (SSL) using the Steered Response Power with the phase transform (SRP-PHAT) method. The audiovisual cues are combined, and two competing Hidden Markov Models (HMMs) are used to detect silence or the presence of a person speaking. If speech is detected, the corresponding HMM also provides the spatio-temporally coherent location of the speaker. Experimental results show that incorporating the HMM improves the results over the unimodal SRP-PHAT, and the inclusion of video cues provides even further improvements.

► Multimodal approach for HCI.
► Use of microphone array and a single câmera.
► HMM for spatio-temporal coherence.
► Joint approach for voice activity detection (VAD) and Sound Source Localization (SSL).

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 33, Issue 4, March 2012, Pages 373–380

نویسندگان

Dante A. Blauth, Vicente P. Minotto, Claudio R. Jung, Bowon Lee, Ton Kalker,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Voice activity detection and speaker localization using audiovisual cues

دسترسی سریع

ارتباط

English Website