کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558325 874902 2013 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Multiple cameras for audio-visual speech recognition in an automotive environment
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Multiple cameras for audio-visual speech recognition in an automotive environment
چکیده انگلیسی

Audio-visual speech recognition, or the combination of visual lip-reading with traditional acoustic speech recognition, has been previously shown to provide a considerable improvement over acoustic-only approaches in noisy environments, such as that present in an automotive cabin. The research presented in this paper will extend upon the established audio-visual speech recognition literature to show that further improvements in speech recognition accuracy can be obtained when multiple frontal or near-frontal views of a speaker's face are available. A series of visual speech recognition experiments using a four-stream visual synchronous hidden Markov model (SHMM) are conducted on the four-camera AVICAR automotive audio-visual speech database. We study the relative contribution between the side and central orientated cameras in improving visual speech recognition accuracy. Finally combination of the four visual streams with a single audio stream in a five-stream SHMM demonstrates a relative improvement of over 56% in word recognition accuracy when compared to the acoustic-only approach in the noisiest conditions of the AVICAR database.


► In automotive environments, video speech degrades as driving conditions change.
► Visual speech accuracy increases as more cameras are made available.
► More visual speech information is available in central than side cameras.
► Audio-visual speech accuracy also increases as more cameras are made available.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 27, Issue 4, June 2013, Pages 911–927
نویسندگان
, , , ,