Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
497265 | Applied Soft Computing | 2010 | 6 Pages |
Abstract
Focused web crawlers collect topic-related web pages from the Internet. Using Q learning and semi-supervised learning theories, this study proposes an online semi-supervised clustering approach for topical web crawlers (SCTWC) to select the most topic-related URL to crawl based on the scores of the URLs in the unvisited list. The scores are calculated based on the fuzzy class memberships and the Q values of the unlabelled URLs. Experimental results show that SCTWC increases the crawling performance.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Huaxiang Zhang, Jing Lu,