کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
379270 659283 2006 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Sampling, information extraction and summarisation of Hidden Web databases
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Sampling, information extraction and summarisation of Hidden Web databases
چکیده انگلیسی

Hidden Web databases maintain a collection of specialised documents, which are dynamically generated using page templates. This paper presents the Two-Phase Sampling (2PS) technique that detects and extracts query-related information from documents contained in databases. 2PS is based on a two-phase framework for the sampling, information extraction and summarisation of Hidden Web documents. In the first phase, 2PS samples and stores documents for further analysis. In the second phase, it detects Web page templates from sampled documents and extracts relevant information from which a content summary is then generated. Experimental results demonstrate that 2PS effectively eliminates irrelevant information from sampled documents and generates terms and frequencies with improved accuracy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 59, Issue 2, November 2006, Pages 213–230
نویسندگان
, , , ,