کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
725629 1461275 2008 5 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Filtering noise in Web pages based on parsing tree
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مهندسی برق و الکترونیک
پیش نمایش صفحه اول مقاله
Filtering noise in Web pages based on parsing tree
چکیده انگلیسی

This paper proposes a novel method to filter web pages using parsing tree. Firstly, this paper explains how features of noises in web pages can be analyzed and extracted. Secondly, this paper explains how the parsing tree of the web pages can be built using document object model (DOM). Finally, this paper explains how domain specific extraction rules and statistic methods can be deployed to eliminate noises and to extract main texts from the web pages. A simulation is conducted and the results show the applicability and feasibility of the proposed method.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: The Journal of China Universities of Posts and Telecommunications - Volume 15, Supplement, September 2008, Pages 46-50