| Article ID | Journal | Published Year | Pages | File Type | 
|---|---|---|---|---|
| 571783 | Knowledge-Based Systems | 2016 | 10 Pages | 
Abstract
												Data selection is a widely used and effective solution to domain adaptation in statistical machine translation (SMT). The dominant methods are perplexity-based ones, which do not consider the mutual translations of sentence pairs and tend to select short sentences. In this paper, to address these problems, we propose bilingual semi-supervised recursive neural network data selection methods to differentiate domain-relevant data from out-domain data. The proposed methods are evaluated in the task of building domain-adapted SMT systems. We present extensive comparisons and show that the proposed methods outperform the state-of-the-art data selection approaches.
Related Topics
												
													Physical Sciences and Engineering
													Computer Science
													Artificial Intelligence
												
											Authors
												Derek F. Wong, Yi Lu, Lidia S. Chao, 
											