کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
515868 | 867124 | 2014 | 15 صفحه PDF | دانلود رایگان |
• We propose novel method for mining high quality translation from comparable corpus.
• We introduce Term Association Network (TAN) for mining Translation knowledge.
• We propose a new method for term translation validity using cross outlier detection.
• Results show that proposed methods significantly outperforms dictionary-based method.
• Our methods are specially effective in translating OOV terms by expanding query words.
Knowledge acquisition and bilingual terminology extraction from multilingual corpora are challenging tasks for cross-language information retrieval. In this study, we propose a novel method for mining high quality translation knowledge from our constructed Persian–English comparable corpus, University of Tehran Persian–English Comparable Corpus (UTPECC). We extract translation knowledge based on Term Association Network (TAN) constructed from term co-occurrences in same language as well as term associations in different languages. We further propose a post-processing step to do term translation validity check by detecting the mistranslated terms as outliers. Evaluation results on two different data sets show that translating queries using UTPECC and using the proposed methods significantly outperform simple dictionary-based methods. Moreover, the experimental results show that our methods are especially effective in translating Out-Of-Vocabulary terms and also expanding query words based on their associated terms.
Journal: Information Processing & Management - Volume 50, Issue 2, March 2014, Pages 384–398