Application of structured document parsing to focused web crawling

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
454208	695121	2011	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Focused Web crawler Robot - روبات information structure - ساختار اطلاعات Attribute - صفت Structural element - عنصر ساختاری SPIDER - عنکبوت

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

Application of structured document parsing to focused web crawling

چکیده انگلیسی

The performance of a focused, or topic-specific Web robot can be improved by taking into consideration the structure of the documents downloaded by the robot. In the case of HTML, document structure is tree-like, defined by nested document elements (tags) and their attributes. By analysing this structure, a robot may use the text of certain HTML elements to prioritise documents for downloading and thus significantly improve the speed of convergence to a topic. Clear separation of the structure-aware document parser from the download scheduler provides flexibility but requires a standard interface and protocol between the two. The paper discusses such an interface in the context of an experimental Web robot, whose speed of convergence to a topic was observed to increase by a factor of 3 to 8, as measured by the number of documents downloaded to reach a given average relevance score.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Standards & Interfaces - Volume 33, Issue 3, March 2011, Pages 325–331

نویسندگان

Ahmed Patel, Nikita Schmidt,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Application of structured document parsing to focused web crawling

دسترسی سریع

ارتباط

English Website