Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

Article ID	Journal	Published Year	Pages	File Type
396458	Information Sciences	2006	24 Pages	PDF

Abstract

This paper describes a framework for modeling the machine transliteration problem. The parameters of the proposed model are automatically acquired through statistical learning from a bilingual proper name list. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between source and target words. We also report how the model is applied to extract proper names and corresponding transliterations from parallel corpora. Experimental results show that the average rates of word and character precision are 93.8% and 97.8%, respectively.

Keywords

Parallel corpora Statistical learning