دانلود رایگان مقاله: یک روش بهبود یافته دو مرحله ای از زبان ترکیبی برای رفع نواقص کلمات در لغت نامه بزرگ معنایی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558292	874892	2014	22 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition

ترجمه فارسی عنوان

یک روش بهبود یافته دو مرحله ای از زبان ترکیبی برای رفع نواقص کلمات در لغت نامه بزرگ معنایی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

یک روش بهبود یافته دو مرحله ای از زبان ترکیبی برای رفع نواقص کلمات در لغت نامه بزرگ معنایی

چکیده انگلیسی

• We recover out-of-vocabulary (OOV) words in continuous speech recognition.
• OOV words are detected and transcribed with a mixed word/sub-word language model.
• Unlike traditional hybrid language models, ours consists of two independent components.
• The orthography of OOV words is retrieved with a table look-up approach.
• We outperform state-of-the-art approaches, especially on an out-of-domain task.

This paper presents a two-stage mixed language model technique for detecting and recognizing words that are not included in the vocabulary of a large vocabulary continuous speech recognition system. The main idea is to spot the out-of-vocabulary words and to produce a transcription for these words in terms of subword units with the help of a mixed word/subword language model in the first stage, and to convert the subword transcriptions to word hypotheses by means of a look-up table in the second stage. The performance of the proposed approach is compared to that of the state-of-the-art hybrid method reported in the literature, both on in-domain and on out-of-domain Dutch spoken material, where the term ‘domain’ refers to the ensemble of topics that were covered in the material from which the lexicon and language model were retrieved. It turns out that the proposed approach is at least equally effective as a hybrid approach when it comes to recognizing in-domain material, and significantly more effective when applied to out-of-domain data. This proves that the proposed approach is easily adaptable to new domains and to new words (e.g. proper names) in the same domain. On the out-of-domain recognition task, the word error rate could be reduced by 12% relative over a baseline system incorporating a 100k word vocabulary and a basic garbage OOV word model.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 28, Issue 1, January 2014, Pages 141–162

نویسندگان

Bert Réveil, Kris Demuynck, Jean-Pierre Martens,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک روش بهبود یافته دو مرحله ای از زبان ترکیبی برای رفع نواقص کلمات در لغت نامه بزرگ معنایی

دسترسی سریع

ارتباط

English Website