Article ID Journal Published Year Pages File Type
563166 Computer Speech & Language 2012 17 Pages PDF
Abstract

In this paper we present a statistical approach to question answering (QA). Our motivation is to build robust systems for many languages without the need for highly tuned linguistic modules. Consequently, word tokens and web data are used extensively but neither explicit linguistic knowledge nor annotated data is incorporated. A mathematical model for answer retrieval and answer classification is derived. Experiments are conducted by searching for answers in the AQUAINT corpus, as well as in web data. The redundancy inherent in web data outperforms retrieval from a fixed corpus, where there are typically relatively few answer occurrences for any given question. We participated with an implementation of this framework in the TREC 2006 QA evaluations, where we ranked 9th among 27 participants on the factoid task.

► A statistical, language independent approach to question answering. ► Exploits redundancy in web data. ► Ranked 9th among 27 participants on the TREC 2006 factoid QA task.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,