کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10342796 696280 2005 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learnable topic-specific web crawler
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
Learnable topic-specific web crawler
چکیده انگلیسی
Topic-specific web crawler collects relevant web pages of interested topics from the Internet. There are many previous researches focusing on algorithms of web page crawling. The main purpose of those algorithms is to gather as many relevant web pages as possible, and most of them only detail the approaches of the first crawling. However, no one has ever mentioned some important questions, such as how the crawler performs during the next crawling attempts, can the crawler learn from experience to crawl more relevant web pages in an incremental way, etc. In this paper, we present an algorithm that covers the discussion of both the first and the consecutive crawling. For efficient result of the next crawling, we derive the information of previous crawling attempts to build some knowledge bases: starting URLs, topic keywords and URL prediction. These knowledge bases are used to build the experience of the learnable topic-specific web crawler to produce better result for the next crawling. Preliminary evaluation illustrates that the proposed web crawler can learn from experience to better collect the web pages under interest during the early period of consecutive crawling attempts.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Network and Computer Applications - Volume 28, Issue 2, April 2005, Pages 97-114
نویسندگان
, ,