کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
568928 876489 2007 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Large vocabulary continuous speech recognition of an inflected language using stems and endings
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Large vocabulary continuous speech recognition of an inflected language using stems and endings
چکیده انگلیسی

In this article, we focus on creating a large vocabulary speech recognition system for the Slovenian language. Currently, state-of-the-art recognition systems are able to use vocabularies with sizes of 20,000 to 100,000 words. These systems have mostly been developed for English, which belongs to a group of uninflectional languages. Slovenian, as a Slavic language, belongs to a group of inflectional languages. Its rich morphology presents a major problem in large vocabulary speech recognition. Compared to English, the Slovenian language requires a vocabulary approximately 10 times greater for the same degree of text coverage. Consequently, the difference in vocabulary size causes a high degree of OOV (out-of-vocabulary words). Therefore OOV words have a direct impact on recognizer efficiency. The characteristics of inflectional languages have been considered when developing a new search algorithm with a method for restricting the correct order of sub-word units, and to use separate language models based on sub-words. This search algorithm combines the properties of sub-word-based models (reduced OOV) and word-based models (the length of context). The algorithm also enables better search-space limitation for sub-word models. Using sub-word models, we increase recognizer accuracy and achieve a comparable search space to that of a standard word-based recognizer. Our methods were evaluated in experiments on a SNABI speech database.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 49, Issue 6, June 2007, Pages 437–452
نویسندگان
, , ,