کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
485462 703327 2016 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System
چکیده انگلیسی

Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 81, 2016, Pages 243–249
نویسندگان
, , ,