کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
453452 694856 2007 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Logical structure analysis: From HTML to XML
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
Logical structure analysis: From HTML to XML
چکیده انگلیسی

This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Standards & Interfaces - Volume 29, Issue 1, January 2007, Pages 109–124
نویسندگان
, , ,