Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
485462 | Procedia Computer Science | 2016 | 7 Pages |
Abstract
Statistical Machine Translation (SMT) is one of the most popular methods for machine translation. In this work, we carried out English-Malay SMT by acquiring an English-Malay parallel corpus in computer science domain. On the other hand, the training parallel corpus is from a general domain. Thus, there will be a lot of out of vocabulary during translation. We attempt to improve the English-Malay SMT in computer science domain using a dictionary and an English lemmatizer. Our study shows that a combination of approach using bilingual dictionary and English lemmatization improves the BLEU score for English to Malay translation from 12.90 to 15.41.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Yin-Lai Yeong, Tien-Ping Tan, Siti Khaotijah Mohammad,