کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
379153 659270 2008 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization
چکیده انگلیسی

To integrate or link the data stored in heterogeneous data sources, a critical problem is entity matching, i.e., matching records representing semantically corresponding entities in the real world, across the sources. While decision tree techniques have been used to learn entity matching rules, most decision tree learners have an inherent representational bias, that is, they generate univariate trees and restrict the decision boundaries to be axis-orthogonal hyper-planes in the feature space. Cascading other classification methods with decision tree learners can alleviate this bias and potentially increase classification accuracy. In this paper, the authors apply a recently-developed constrained cascade generalization method in entity matching and report on empirical evaluation using real-world data. The evaluation results show that this method outperforms the base classification methods in terms of classification accuracy, especially in the dirtiest case.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 66, Issue 3, September 2008, Pages 368–381
نویسندگان
, ,