کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
534034 870207 2015 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning relational facts from the web: A tolerance rough set approach
ترجمه فارسی عنوان
یادگیری ارتباطات حقوقی از وب: رویکرد خشن تحمل
کلمات کلیدی
تساهل مجموعه های خشن، روش های گرانول معدن وب یادگیری نیمه نظارتی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• A key issue when mining web information is the labeling problem.
• Tolerance rough sets are used to structure categorical instances and relations.
• We report a semi-supervised algorithm (TPL) that labels the ontological information.
• The Never Ending Language Learner (Nell) system provides the ontology.
• The performance of TPL compares well with CBS and CPL and handles concept drift.

A key issue when mining web information is the labeling problem: data are abundant on the web but is unlabeled. In this paper, we address this problem by proposing (i) a granular model that structures categorical noun phrase instances as well as semantically related noun phrase pairs from a given corpus representing unstructured web pages with a tolerance form of rough sets, (ii) a semi-supervised Tolerant Pattern Learning (TPL) algorithm that labels categorical instances as well as relations. This work is an extension of the TPL algorithm presented in our earlier paper. Our model treats noun phrases, which are described as sets of their co-occurring contextual patterns. We use the ontological information from the Never Ending Language Learner (Nell) system. We compared the performance of our algorithm with Coupled Bayesian Sets (CBS) and Coupled Pattern Learner (CPL) algorithms for categorical and relational extractions, respectively. Experimental results suggest that TPL can achieve comparable performance with CBS and CPL in terms of precision.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 67, Part 2, 1 December 2015, Pages 130–137
نویسندگان
, ,