کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6934573 1449512 2018 5 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
New document scoring model based on interval tree
ترجمه فارسی عنوان
مدل ارزیابی سند جدید بر اساس درخت فاصله
کلمات کلیدی
به ثمر رساندن فاصله، وزنهای زمین، فرکانس مدت، درخت فاصله، تجسم متن،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی
Classical BM25 scoring is designed for unstructured documents. In the past years, people try to adapt the BM25 ranking formula to deal with structured documents. Most works on structured document retrieval treat the combination of field scores, but it is hard to determine the field weights before the formation of document score. We aim to establish a new method to sort the field weights. The motivation comes from two aspects. On the one hand, the construction of interval tree reflects retrieval results with higher-order proximity for a text field. According to writing style, the important sentence or phrase for representing main idea frequently appear in the front or the rear part of a text-field. Therefore, the proximity scoring for different part in a text-field should be different. We thus take higher factor for calculating proximity scoring in the front and the rear parts than in the middle part. On the other hand, the more the interval length includes inquiring terms, the less the proximity scoring is, thereby the higher tf value for term appearing in an interval should affect the computation of proximity scoring. Therefore, we develop a new method for calculating the field weights based on the ranking score. The ranking score for each field can be calculated by interval tree based on terms relevance. Interval tree can be viewed as a tool of higher terms proximity in text visualization. This new field weights reflect the terms proximity and can be used to calculate document scoring for terms retrieval. Experimental results show that the new document scoring model well reflects the terms proximity, and the new document scoring scheme ScoreComp, combined with interval scoring, is more sensitive than scheme FreqComp combined with interval scoring.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Visual Languages & Computing - Volume 45, April 2018, Pages 39-43
نویسندگان
, ,