کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4973674 1451682 2017 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Source sentence simplification for statistical machine translation
ترجمه فارسی عنوان
ساده سازی جمله منبع برای ترجمه ماشین آماری
کلمات کلیدی
ترجمه ماشین سلسله مراتبی، ساده سازی متن، ترجمه ماشین عصبی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


- Long and complex input sentences can be a challenge for translation systems.
- Source simplification is a way to reduce the complexity of the input.
- Translation lattices allow to combine the output spaces of full and simplified inputs.
- Constraining the hypothesis space to translations of simplified inputs can be beneficial.

Long sentences with complex syntax and long-distance dependencies pose difficulties for machine translation systems. Short sentences, on the other hand, are usually easier to translate. We study the potential of addressing this mismatch using text simplification: given a simplified version of the full input sentence, can we use it in addition to the full input to improve translation? We show that the spaces of original and simplified translations can be effectively combined using translation lattices and compare two decoding approaches to process both inputs at different levels of integration. We demonstrate on source-annotated portions of WMT test sets and on top of strong baseline systems combining hierarchical and neural translation for two language pairs that source simplification can help to improve translation quality.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 45, September 2017, Pages 221-235
نویسندگان
, , , , ,