دانلود رایگان مقاله: سازگاری گفتار بصری صوتی پایگاه داده برای تشخیص اصطلاح گفتاری آوایی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4973656	1451683	2017	36 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Cross database audio visual speech adaptation for phonetic spoken term detection

ترجمه فارسی عنوان

سازگاری گفتار بصری صوتی پایگاه داده برای تشخیص اصطلاح گفتاری آوایی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

تشخیص اصطلاح گفتاری، مدل مارکف پنهان همگام، آموزش متشکل از پایگاه داده، شناسایی تلفن،

Spoken term detection - تشخیص اصطلاح گفتاری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

سازگاری گفتار بصری صوتی پایگاه داده برای تشخیص اصطلاح گفتاری آوایی

چکیده انگلیسی

Spoken term detection (STD), the process of finding all occurrences of a specified search term in a large amount of speech segments, has many applications in multimedia search and retrieval of information. It is known that use of video information in the form of lip movements can improve the performance of STD in the presence of audio noise. However, research in this direction has been hampered by the unavailability of large annotated audio visual databases for development. We propose a novel approach to develop audio visual spoken term detection when only a small (low resource) audio visual database is available for development. First, cross database training is proposed as a novel framework using the fused hidden Markov modeling (HMM) technique, which is used to train an audio model using extensive large and publicly available audio databases; then it is adapted to the visual data of the given audio visual database. This approach is shown to perform better than standard HMM joint-training method and also improves the performance of spoken term detection when used in the indexing stage. In another attempt, the external audio models are first adapted to the audio data of the given audio visual database and then they are adapted to the visual data. This approach also improves both phone recognition and spoken term detection accuracy. Finally, the cross database training technique is used as HMM initialization, and an extra parameter re-estimation step is applied on the initialized models using Baum Welch technique. The proposed approaches for audio visual model training have allowed for benefiting from both large extensive out of domain audio databases that are available and the small audio visual database that is given for development to create more accurate audio-visual models.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 44, July 2017, Pages 1-21

نویسندگان

Shahram Kalantari, David Dean, Sridha Sridharan,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : سازگاری گفتار بصری صوتی پایگاه داده برای تشخیص اصطلاح گفتاری آوایی

دسترسی سریع

ارتباط

English Website