Multiple cameras for audio-visual speech recognition in an automotive environment

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558325	874902	2013	17 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Speech recognition - تشخیص گفتار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Multiple cameras for audio-visual speech recognition in an automotive environment

چکیده انگلیسی

Audio-visual speech recognition, or the combination of visual lip-reading with traditional acoustic speech recognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visual speech recognition literature to show that further improvements in speech recognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visual speech recognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotive audio-visual speech database. We study the relative contribution between the side and central orientated cameras in improving visual speech recognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.

► In automotive environments, video speech degrades as driving conditions change.
► Visual speech accuracy increases as more cameras are made available.
► More visual speech information is available in central than side cameras.
► Audio-visual speech accuracy also increases as more cameras are made available.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 4, June 2013, Pages 911–927

نویسندگان

Rajitha Navarathna, David Dean, Sridha Sridharan, Patrick Lucey,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Multiple cameras for audio-visual speech recognition in an automotive environment

دسترسی سریع

ارتباط

English Website