Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
571783 | Knowledge-Based Systems | 2016 | 10 Pages |
Abstract
Data selection is a widely used and effective solution to domain adaptation in statistical machine translation (SMT). The dominant methods are perplexity-based ones, which do not consider the mutual translations of sentence pairs and tend to select short sentences. In this paper, to address these problems, we propose bilingual semi-supervised recursive neural network data selection methods to differentiate domain-relevant data from out-domain data. The proposed methods are evaluated in the task of building domain-adapted SMT systems. We present extensive comparisons and show that the proposed methods outperform the state-of-the-art data selection approaches.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Derek F. Wong, Yi Lu, Lidia S. Chao,