Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6900709 | Procedia Computer Science | 2018 | 7 Pages |
Abstract
Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is used to shoot queries on search interface. These search interfaces are found on webpage of the website corresponding to seed URL. This helps crawler to get most relevant links from the domain without actually going in depth of that domain. No existing focused crawling approach uses query based approach to find webpages of interest. In the proposed crawler, list of keywords is passed to the search query interfaces found on the websites. The proposed work will give the most relevant information based on the keywords in a particular domain without actually crawling through many irrelevant links in between them.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Manish Kumar, Ankit Bindal, Robin Gautam, Rajesh Bhatia,