Towards accurate predictors of word quality for Machine Translation: Lessons learned on French–English and English

Article ID	Journal	Published Year	Pages	File Type
378800	Data & Knowledge Engineering	2015	11 Pages	PDF

Abstract

This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French–English (fr–en) and English–Spanish (en–es). After the experiments with all features, we deploy a “Feature Selection” strategy to filter the best performing ones. Then, a method that combines multiple “weak” classifiers to constitute a strong “composite” classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr–en and en–es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.

Keywords

Confidence measure Confidence estimation machine translation Boosting Conditional random fields