کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402603 676968 2015 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
AutoRM: An effective approach for automatic Web data record mining
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
AutoRM: An effective approach for automatic Web data record mining
چکیده انگلیسی

A Web database typically responds to a query with a Web page, which encodes the query results into semi-structured data objects using HTML tags. We call such data objects Web data records or data records. Mining Web data records is very important for many applications, e.g., meta search, comparative shopping, etc. This paper proposes a new effective approach called AutoRM, which mines data records from single Web page automatically. AutoRM involves three major steps: (1) constructing the DOM tree of the given Web page; (2) mining all sets of adjacent similar C-Records (Candidate data Records) from the constructed DOM tree; (3) mining actual data records from C-Records. In many Web pages, similar data records are distributed in bigger and adjacent similar objects. Existing approaches typically identify such objects as data records. Conversely, AutoRM views such objects as C-Records, and mines actual data records from them. One key issue for mining similar data records is the boundary detection of each data record. Existing approaches typically make some brittle assumptions for handling this issue. By making more robust assumptions, AutoRM tends to detect data record boundaries more accurately. Experimental results show that AutoRM is highly effective, and outperforms state-of-the-art approaches.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 89, November 2015, Pages 314–331
نویسندگان
, , , , ,