Article ID Journal Published Year Pages File Type
378800 Data & Knowledge Engineering 2015 11 Pages PDF
Abstract

This paper proposes some ideas to build effective estimators, which predict the quality of words in a Machine Translation (MT) output. We propose a number of novel features of various types (system-based, lexical, syntactic and semantic) and then integrate them into the conventional (previously used) feature set, for our baseline classifier training. The classifiers are built over two different bilingual corpora: French–English (fr–en) and English–Spanish (en–es). After the experiments with all features, we deploy a “Feature Selection” strategy to filter the best performing ones. Then, a method that combines multiple “weak” classifiers to constitute a strong “composite” classifier by taking advantage of their complementarity allows us to achieve a significant improvement in terms of F-score, for both fr–en and en–es systems. Finally, we exploit word confidence scores for improving the quality estimation system at sentence level.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,