دانلود رایگان مقاله: گرفتن بیشتر از رونوشت های خودکار برای مدل سازی زبان نیمه تحت نظارت

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558222	1451691	2016	17 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Getting more from automatic transcripts for semi-supervised language modeling

ترجمه فارسی عنوان

گرفتن بیشتر از رونوشت های خودکار برای مدل سازی زبان نیمه تحت نظارت

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

زبان مدل سازی؛ تشخیص گفتار خودکار؛ LVCSR؛ کم منابع

LVCSR Automatic speech recognition - تشخیص گفتار خودکار Language modeling - مدل سازی زبان low-resource - کم منابع

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

گرفتن بیشتر از رونوشت های خودکار برای مدل سازی زبان نیمه تحت نظارت

چکیده انگلیسی

• We analyze why semi-supervised backoff language modeling performs poorly.
• We motivate MAP adaptation of a log-linear language model.
• We use automatic transcripts as a prior for language model estimation.
• We show consistent reduction in WER across a range of low-resource conditions.

Many under-resourced languages such as Arabic diglossia or Hindi sub-dialects do not have sufficient in-domain text to build strong language models for use with automatic speech recognition (ASR). Semi-supervised language modeling uses a speech-to-text system to produce automatic transcripts from a large amount of in-domain audio typically to augment a small amount of manual transcripts. In contrast to the success of semi-supervised acoustic modeling, conventional language modeling techniques have provided only modest gains. This paper first explains the limitations of back-off language models due to their dependence on long-span n-grams, which are difficult to accurately estimate from automatic transcripts. From this analysis, we motivate a more robust use of the automatic counts as a prior over the estimated parameters of a log-linear language model. We demonstrate consistent gains for semi-supervised language models across a range of low-resource conditions.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 36, March 2016, Pages 93–109

نویسندگان

Scott Novotney, Richard Schwartz, Sanjeev Khudanpur,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : گرفتن بیشتر از رونوشت های خودکار برای مدل سازی زبان نیمه تحت نظارت

دسترسی سریع

ارتباط

English Website