کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
514986 866931 2012 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Index ordering by query-independent measures
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Index ordering by query-independent measures
چکیده انگلیسی

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming.A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced.


► Use of query-independent measures to prune the size of the inverted index.
► Use of the same measures to improve the performance of an non-pruned index.
► Effective combination of a number of query-independent measures.
► Evaluation of performance directly on the inverted index (using IP measure).
► Achieving an improvement in P10 performance with only 15% of the inverted index.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 48, Issue 3, May 2012, Pages 569–586
نویسندگان
, ,