دانلود رایگان مقاله: ترجمه ماشین آماری از زیرنویس برای جفت زبان بسیار فشرده

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
534280	870241	2014	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Statistical machine translation of subtitles for highly inflected language pair

ترجمه فارسی عنوان

ترجمه ماشین آماری از زیرنویس برای جفت زبان بسیار فشرده

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

ترجمه ماشین آماری، ترجمه مبتنی بر عبارت، زبانهای بسیار پررنگ فرهنگ لغت دو زبانه آنتروپی

Entropy - آنتروپی statistical machine translation - ترجمه ماشین آماری

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش مقاله

ترجمه ماشین آماری از زیرنویس برای جفت زبان بسیار فشرده

چکیده انگلیسی

• We present a study on phrase-based statistical machine translation between two closely related languages.
• Linguistic information improves translation quality when dealing with highly inflected languages.
• The integration of a dictionary is beneficial in lemma translation component of SMT (statistical machine translation) system.
• SMT produces useful translations in subtitle domain.

This paper addresses the problem of statistical machine translation between highly inflected languages. Even when dealing with closely-related language pairs, statistical machine translation encounters problems if the parallel corpus is not big enough. To reduce the problem of data sparsity, we use the approach called factored translation, which has proven successful when translating between English and a morphologically rich language. We show that it is even more useful when translating between two highly inflected languages. The main contribution of the paper involves two extensions of the factored translation approach. First, we propose a new, more general asynchronous framework for training translation components, where lemmas in the lemma component and MSD tags in the MSD component are aligned independently of alignment done for surface word forms. The second contribution of the paper is a new technique for efficient use of a bilingual dictionary in the translation process. A dictionary is introduced into the lemma component to improve lexical translation. Dictionary use is based on entropy. We tested our enhanced translation approach on the Slovenian–Serbian language pair. The system was trained on a freely available OpenSubtitle corpus. The results show improvements in automatic scores (BLEU and TER). The approach could be used for other language pairs, especially if one or both are highly inflected.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 46, 1 September 2014, Pages 96–103

نویسندگان

Mirjam Sepesy Maučec, Zdravko Kačič, Darinka Verdonik,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : ترجمه ماشین آماری از زیرنویس برای جفت زبان بسیار فشرده

دسترسی سریع

ارتباط

English Website