Bilingual recursive neural network based data selection for statistical machine translation

Article ID	Journal	Published Year	Pages	File Type
571783	Knowledge-Based Systems	2016	10 Pages	PDF

Abstract

Data selection is a widely used and effective solution to domain adaptation in statistical machine translation (SMT). The dominant methods are perplexity-based ones, which do not consider the mutual translations of sentence pairs and tend to select short sentences. In this paper, to address these problems, we propose bilingual semi-supervised recursive neural network data selection methods to differentiate domain-relevant data from out-domain data. The proposed methods are evaluated in the task of building domain-adapted SMT systems. We present extensive comparisons and show that the proposed methods outperform the state-of-the-art data selection approaches.

Keywords

Data selection Domain adaptation machine translation Autoencoder Recursive neural network