کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
454208 695121 2011 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Application of structured document parsing to focused web crawling
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
Application of structured document parsing to focused web crawling
چکیده انگلیسی

The performance of a focused, or topic-specific Web robot can be improved by taking into consideration the structure of the documents downloaded by the robot. In the case of HTML, document structure is tree-like, defined by nested document elements (tags) and their attributes. By analysing this structure, a robot may use the text of certain HTML elements to prioritise documents for downloading and thus significantly improve the speed of convergence to a topic. Clear separation of the structure-aware document parser from the download scheduler provides flexibility but requires a standard interface and protocol between the two. The paper discusses such an interface in the context of an experimental Web robot, whose speed of convergence to a topic was observed to increase by a factor of 3 to 8, as measured by the number of documents downloaded to reach a given average relevance score.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Standards & Interfaces - Volume 33, Issue 3, March 2011, Pages 325–331
نویسندگان
, ,