کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
535282 870336 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data
چکیده انگلیسی


• A comparison between template matching and conditional random fields is performed in the topic of phrasing.
• This study concerns machine translation applications where the amount of training data is sparse.
• Template matching provides a more effective phrasing scheme than probabilistic models such as CRF.

This communication focuses on comparing the template-matching technique to established probabilistic approaches – such as conditional random fields (CRF) – on a specific linguistic task, namely the phrasing of a sequence of words into phrases. This task represents a low-level parsing of the sequence into linguistically-motivated phrases. CRF represents the established method for implementing such a data-driven parser, while template-matching is a simpler method that is faster to train and operate. The two aforementioned techniques are compared here to determine the most suitable approach for extracting an accurate model.The specific application studied is related to a machine translation (MT) methodology (namely PRESEMT), though the comparison performed holds for other applications as well, for which only sparse training data are available. PRESEMT uses small parallel corpora to learn structural transformations from a source language (SL) to a target language (TL) and thus translate input text. This results in the availability of only sparse training data from which to train the parser. Experimental results indicate that for a limited-size training set, as is the case for the PRESEMT methodology, template-matching generates a superior phrasing model that in turn generates higher quality translations. This is confirmed by studying more than one source/target language pairs, for multiple independent testsets.

Graphical Abstractword/word/Figure optionsDownload high-quality image (69 K)Download as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 53, 1 February 2015, Pages 44–52
نویسندگان
,