کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515511 867036 2009 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A lemmatization method for Mongolian and its application to indexing for information retrieval
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
A lemmatization method for Mongolian and its application to indexing for information retrieval
چکیده انگلیسی

In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 45, Issue 4, July 2009, Pages 438–451
نویسندگان
, ,