Matching meaning for cross-language information retrieval

Article ID	Journal	Published Year	Pages	File Type
515428	Information Processing & Management	2012	23 Pages	PDF

Abstract

This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.

► We describe a framework for cross-language information retrieval. ► The framework leverages statistical estimation of translation probabilities. ► It models synonymy and bidirectional translation knowledge. ► It yields a balance between retrieval effectiveness and efficiency. ► Evaluations show consistent improvements over strong baselines.

Keywords

statistical machine translation