Topic-specific crawling on the Web with the measurements of the relevancy context graph

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
397073	670680	2006	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

WWW Information retrieval - بازیابی اطلاعات Document categorization - طبقه بندی اسناد

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Topic-specific crawling on the Web with the measurements of the relevancy context graph

چکیده انگلیسی

One of the major problems for automatically constructed portals and information discovery systems is how to assign proper order to unvisited web pages. Topic-specific crawlers and information seeking agents should try not to traverse the off-topic areas and concentrate on links that lead to documents of interest. In this paper, we propose an effective approach based on the relevancy context graph to solve this problem. The graph can estimate the distance and the relevancy degree between the retrieved document and the given topic. By calculating the word distributions of the general and topic-specific feature words, our method will preserve the property of the relevancy context graph and reflect it on the word distributions. With the help of topic-specific and general word distribution, our crawler can measure a page's expected relevancy to a given topic and determine the order in which pages should be visited first. Simulations are also performed, and the results show that our method outperforms than the breath-first and the method using only the context graph.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 31, Issues 4–5, June–July 2006, Pages 232–246

نویسندگان

Ching-Chi Hsu, Fan Wu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Topic-specific crawling on the Web with the measurements of the relevancy context graph

دسترسی سریع

ارتباط

English Website