Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
396621 | Information Systems | 2007 | 23 Pages |
Abstract
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler self-evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain-specific web documents. Our implementation presents a different approach to focused crawling and aims to overcome the limitations imposed by the need to provide initial data for training, while maintaining a high recall/precision ratio. We compare its efficiency with other well-known web information retrieval techniques.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
G. Almpanidis, C. Kotropoulos, I. Pitas,