کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
407025 678124 2014 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Estimating retrievability ranks of documents using document features
ترجمه فارسی عنوان
ارزیابی صفات بازیابی مجدد اسناد با استفاده از ویژگی های سند
کلمات کلیدی
ارزیابی مدل های بازیابی، تجزیه و تحلیل تعصب سیستم های بازیابی، قابلیت پیدا کردن اسناد بازیابی ثبت اختراع
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• Retrievability is used for analyzing retrieval bias of retrieval models.
• Measuring retrievability requires processing of exhaustive number of queries.
• We proposed query independent approach on the basis of document features in order to estimate the retrievability of documents.
• We proposed three classes of features. These are surface level, terms weighting, and density around nearest neighbors of documents.
• Experiments indicate that our approach predicts the retrievability of documents more efficiently with less processing time and fewer resources.

Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the recall oriented retrieval domains such as patent or legal retrieval. This is because if a retrieval system for these retrieval domains makes some documents hard to find then professional searchers would have a difficult time when retrieving these documents. One main limitation of retrievability analysis is that it depends upon the processing of exhaustive number of queries. This requires large processing time and resources. In order to handle this problem, in this paper we use document features based approach in order to estimate the retrievability ranks of documents. In experiments, the strong correlation between features and retrievability scores on different collections confirms that it is possible to estimate the retrievability ranks of documents without processing queries. One major advantage of this approach is that it requires fewer resources, and can be computed more quickly as compared to query based approach. While, on the other hand, one major disadvantage of this approach is that it can only estimate the retrievability ranks of documents, but cannot calculate how much there is retrievability inequality (retrieval bias) between the documents of collection.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 123, 10 January 2014, Pages 216–232
نویسندگان
,