Learning to crawl deep web

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
396925	670631	2013	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Hidden web Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

چکیده انگلیسی

Deep web or hidden web refers to the hidden part of the Web (usually residing in structured databases) that remains unavailable for standard Web crawlers. Obtaining content of the deep web is challenging and has been acknowledged as a significant gap in the coverage of search engines. The paper proposes a novel deep web crawling framework based on reinforcement learning, in which the crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and selects an action (query) to submit to the environment (the deep web database) according to Q-value. While the existing methods rely on an assumption that all deep web databases possess full-text search interfaces and solely utilize the statistics (TF or DF) of acquired data records to generate the next query, the reinforcement learning framework not only enables crawlers to learn a promising crawling strategy from its own experience, but also allows for utilizing diverse features of query keywords. Experimental results show that the method outperforms the state of art methods in terms of crawling capability and relaxes the assumption of full-text search implied by existing methods.

► We introduce a reinforcement learning framework for deep web surfacing.
► We propose a surfacing algorithm for both full text and non-full text databases.
► The crawler learns to differentiate rewarding queries from unpromising ones.
► A Q-value approximation algorithm is developed to enable future reward estimation.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 38, Issue 6, September 2013, Pages 801–819

نویسندگان

Qinghua Zheng, Zhaohui Wu, Xiaocheng Cheng, Lu Jiang, Jun Liu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Learning to crawl deep web

دسترسی سریع

ارتباط

English Website