کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
395534 665989 2011 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Modeling term proximity for probabilistic information retrieval models
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Modeling term proximity for probabilistic information retrieval models
چکیده انگلیسی

Proximity among query terms has been found to be useful for improving retrieval performance. However, its application to classical probabilistic information retrieval models, such as Okapi’s BM25, remains a challenging research problem. In this paper, we propose to improve the classical BM25 model by utilizing the term proximity evidence. Four novel methods, namely a window-based N-gram Counting method, Survival Analysis over different statistics, including the Poisson process, an exponential distribution and an empirical function, are proposed to model the proximity between query terms. Through extensive experiments on standard TREC collections, our proposed proximity-based BM25 model, called BM25P, is compared to strong state-of-the-art evaluation baselines, including the original unigram BM25 model, the Markov Random Field model, and the positional language model. According to the experimental results, the window-based N-gram Counting method, and Survival Analysis over an exponential distribution are the most effective among all four proposed methods, which lead to marked improvement over the baselines. This shows that the use of term proximity considerably enhances the retrieval effectiveness of the classical probabilistic models. It is therefore recommended to deploy a term proximity component in retrieval systems that employ probabilistic models.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 181, Issue 14, 15 July 2011, Pages 3017–3031
نویسندگان
, , ,