کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515868 867124 2014 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Mining a Persian–English comparable corpus for cross-language information retrieval
ترجمه فارسی عنوان
معدن انگلیسی قابل مقایسه با زبان انگلیسی برای بازیابی اطلاعات متقابل زبان
کلمات کلیدی
مقادیر قابل مقایسه بازیابی اطلاعات متقابل زبان، شبکه ارتباطی دوره بررسی اعتبار ترجمه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We propose novel method for mining high quality translation from comparable corpus.
• We introduce Term Association Network (TAN) for mining Translation knowledge.
• We propose a new method for term translation validity using cross outlier detection.
• Results show that proposed methods significantly outperforms dictionary-based method.
• Our methods are specially effective in translating OOV terms by expanding query words.

Knowledge acquisition and bilingual terminology extraction from multilingual corpora are challenging tasks for cross-language information retrieval. In this study, we propose a novel method for mining high quality translation knowledge from our constructed Persian–English comparable corpus, University of Tehran Persian–English Comparable Corpus (UTPECC). We extract translation knowledge based on Term Association Network (TAN) constructed from term co-occurrences in same language as well as term associations in different languages. We further propose a post-processing step to do term translation validity check by detecting the mistranslated terms as outliers. Evaluation results on two different data sets show that translating queries using UTPECC and using the proposed methods significantly outperform simple dictionary-based methods. Moreover, the experimental results show that our methods are especially effective in translating Out-Of-Vocabulary terms and also expanding query words based on their associated terms.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 50, Issue 2, March 2014, Pages 384–398
نویسندگان
, ,