Multimodal speaker/speech recognition using lip motion, lip texture and audio

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
564966	875663	2006	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Lip motion Lip reading - خواندن لب Speaker identification - شناسایی بلندگو Isolated word recognition - شناسایی جداگانه Decision fusion - فیوژن تصمیم گیری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Multimodal speaker/speech recognition using lip motion, lip texture and audio

چکیده انگلیسی

We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represented by the well-known mel-frequency cepstral coefficients (MFCC) along with the first and second derivatives, whereas lip texture modality is represented by the 2D-DCT coefficients of the luminance component within a bounding box about the lip region. In this paper, we employ a new lip motion modality representation based on discriminative analysis of the dense motion vectors within the same bounding box for speaker/speech recognition. The fusion of audio, lip texture and lip motion modalities is performed by the so-called reliability weighted summation (RWS) decision rule. Experimental results show that inclusion of lip motion modality provides further performance gains over those which are obtained by fusion of audio and lip texture alone, in both speaker identification and isolated word recognition scenarios.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Signal Processing - Volume 86, Issue 12, December 2006, Pages 3549–3558

نویسندگان

H.E. Çetingül, E. Erzin, Y. Yemez, A.M. Tekalp,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Multimodal speaker/speech recognition using lip motion, lip texture and audio

دسترسی سریع

ارتباط

English Website