Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10338524 | Computer Communications | 2005 | 18 Pages |
Abstract
In this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different crawlers that belong to five search engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic and to general characterization studies. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. We propose a set of simple metrics that describe qualitative characteristics of crawler behavior, vis-Ã -vis a crawler's preference on resources of a particular format, its frequency of visits on a Web site, and the pervasiveness of its visits to a particular site. To the best of our knowledge, this is the first extensive and in depth characterization of search-engine crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of Web crawlers.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Networks and Communications
Authors
Marios D. Dikaiakos, Athena Stassopoulou, Loizos Papageorgiou,