کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
461854 | 696638 | 2013 | 11 صفحه PDF | دانلود رایگان |
A web crawler is an important research component in a search engine. In this paper, a new method for measuring the similarity of formal concept analysis (FCA) concepts and a new notion of a web page's rank are proposed that use an information content approach based on users’ web logs. First, an extension similarity and an intension similarity that analyze a user's browsing pattern and their hyperlinks are proposed. Second, the information content similarity between two nouns is computed automatically by examining their ISA and Part-Of hierarchy and using a user's web log. A method for computing the semantic similarity between two concepts in two different concept lattices (the base concept lattice and the current concept lattice) and finding the semantic ranking of web pages is proposed. Last, our experiment demonstrates that our crawler is more suitable for crawling focused web pages. It proves that the semantic ranking of web pages is useful and efficient for making a web crawler's choice of a web page for continuing work.
► Based on users’ web logs, web page's semantic ranking is defined.
► The extension and intension similarity are defined.
► The information content similarity between two nouns is computed automatically.
► We develop the semantic similarity between two concepts in different concept lattices.
Journal: Journal of Systems and Software - Volume 86, Issue 1, January 2013, Pages 187–197