Exploiting structural similarity for effective Web information extraction

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
379435	659301	2007	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Semistructured data - داده های نیمه ساختاری Wrapping - کاغذ بسته بندی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Exploiting structural similarity for effective Web information extraction

چکیده انگلیسی

In this paper, we propose a classification technique for Web pages, based on the detection of structural similarities among semistructured documents, and devise an architecture exploiting such technique for the purpose of information extraction. The proposal significantly differs from standard methods based on graph-matching algorithms, and is based on the idea of representing the structure of a document as a time series in which each occurrence of a tag corresponds to an impulse. The degree of similarity between documents is then stated by analyzing the frequencies of the corresponding Fourier transform. Experiments on real data show the effectiveness of the proposed technique.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 60, Issue 1, January 2007, Pages 222–234

نویسندگان

Sergio Flesca, Giuseppe Manco, Elio Masciari, Luigi Pontieri, Andrea Pugliese,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Exploiting structural similarity for effective Web information extraction

دسترسی سریع

ارتباط

English Website