کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10338524 693704 2005 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An investigation of web crawler behavior: characterization and metrics
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
An investigation of web crawler behavior: characterization and metrics
چکیده انگلیسی
In this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different crawlers that belong to five search engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic and to general characterization studies. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. We propose a set of simple metrics that describe qualitative characteristics of crawler behavior, vis-à-vis a crawler's preference on resources of a particular format, its frequency of visits on a Web site, and the pervasiveness of its visits to a particular site. To the best of our knowledge, this is the first extensive and in depth characterization of search-engine crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of Web crawlers.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Communications - Volume 28, Issue 8, 16 May 2005, Pages 880-897
نویسندگان
, , ,