Large vocabulary continuous speech recognition of an inflected language using stems and endings

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
568928	876489	2007	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Search algorithm - الگوریتم جستجو Stem - ساقه Large vocabulary continuous speech recognition - معنی لغوی واژگان بزرگ

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Large vocabulary continuous speech recognition of an inflected language using stems and endings

چکیده انگلیسی

In this article, we focus on creating a large vocabulary speech recognition system for the Slovenian language. Currently, state-of-the-art recognition systems are able to use vocabularies with sizes of 20,000 to 100,000 words. These systems have mostly been developed for English, which belongs to a group of uninflectional languages. Slovenian, as a Slavic language, belongs to a group of inflectional languages. Its rich morphology presents a major problem in large vocabulary speech recognition. Compared to English, the Slovenian language requires a vocabulary approximately 10 times greater for the same degree of text coverage. Consequently, the difference in vocabulary size causes a high degree of OOV (out-of-vocabulary words). Therefore OOV words have a direct impact on recognizer efficiency. The characteristics of inflectional languages have been considered when developing a new search algorithm with a method for restricting the correct order of sub-word units, and to use separate language models based on sub-words. This search algorithm combines the properties of sub-word-based models (reduced OOV) and word-based models (the length of context). The algorithm also enables better search-space limitation for sub-word models. Using sub-word models, we increase recognizer accuracy and achieve a comparable search space to that of a standard word-based recognizer. Our methods were evaluated in experiments on a SNABI speech database.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 49, Issue 6, June 2007, Pages 437–452

نویسندگان

Tomaž Rotovnik, Mirjam Sepesy Maučec, Zdravko Kačič,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Large vocabulary continuous speech recognition of an inflected language using stems and endings

دسترسی سریع

ارتباط

English Website