Article ID Journal Published Year Pages File Type
4946865 Neurocomputing 2017 36 Pages PDF
Abstract
The number of webpages on the Internet continues to increase rapidly. Countless webpages focus on user-related issues. Finding useful information on such webpages constitutes a real challenge in the realms of information retrieval and search engine management. Based on the Link Context Graph (LCG), Relevancy Context Graph (RCG) and Concept Context Graph (CCG), we propose and construct a Path Trust Knowledge Graph (PTKG) model for assigning priority values to unvisited webpages. The PTKG model guides a focused crawler to retrieve more relevant webpages with respect to user-specific topics. A given user-specific topic t includes a PTKG that involves six steps: (1) building context graph G(t)=(V,E), where V is the crawled history webpage set and E is the hyperlink set within the history webpages, (2) retrieving the knowledge implied in the paths among these webpages and finding the path lengths, (3) computing the trust degrees of webpages, (4) constructing topic-specific language models using the trust degrees, (5) establishing general language models, and (6) assigning priority values to webpages for ranking purposes. A framework of topic-specific web crawlers based on a PTKG is constructed to collect webpages on user-specific topics. Finally, we draw experimental comparisons between our proposed PTKG approach and the classic LCG and RCG approaches. We show that our method outperforms the LCG and RCG approaches.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,