Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
492658	721632	2014	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs

چکیده انگلیسی

Parallel sentences are a relatively scarce but extremely useful resource for many applications including cross-lingual retrieval and statistical machine translation. This research explores our methodology for mining such data from previously obtained comparable corpora. The task is highly practical since non-parallel multilingual data exist in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Here we propose a web crawling method for building subject-aligned comparable corpora from Wikipedia articles. We also introduce a method for extracting truly parallel sentences that are filtered out from noisy or just comparable sentence pairs. We describe our implementation of a specialized tool for this task as well as training and adaption of a machine translation system that supplies our filter with additional information about the similarity of comparable sentence pairs.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Technology - Volume 18, 2014, Pages 126-132

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs

دسترسی سریع

ارتباط

English Website