کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566184 875949 2009 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Lexical units for Thai LVCSR
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Lexical units for Thai LVCSR
چکیده انگلیسی

Traditional language models rely on lexical units that are defined as entities separated from each other by word boundary markers. Since there are no such boundaries in Thai, alternative definitions of lexical units have to be pursued. The problem is to find the optimal set of lexical units that constitutes the vocabulary of the language model and yields the best final result. The word is a traditional lexical unit recognized by Thai people and is used by most of the natural language processing systems, including an automatic speech recognition system. This paper discusses problems with using words as a lexical unit and investigates other lexical units for the Thai large vocabulary continuous speech recognition (LVCSR) system. The pseudo-morpheme is introduced in the paper and shown to be unsuitable for use as a lexical unit directly. A technique using pseudo-morphemes to improve the system based on the traditional word model is introduced and some improvements can be gained by this technique. Then, a new lexical unit for Thai, the compound pseudo-morpheme, and an algorithm to build compound pseudo-morphemes are presented. The experimental results show that the system using compound pseudo-morphemes outperforms other systems. Thus, the compound pseudo-morpheme is the most suitable lexical unit for Thai LVCSR system.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 51, Issue 4, April 2009, Pages 379–389
نویسندگان
, , , ,