کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
552888 873291 2008 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Entity matching in heterogeneous databases: A logistic regression approach
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر سیستم های اطلاعاتی
پیش نمایش صفحه اول مقاله
Entity matching in heterogeneous databases: A logistic regression approach
چکیده انگلیسی

This paper examines a widely-encountered data heterogeneity problems—often faced in real-world decision support situations—called the entity heterogeneity problem, which arises when the same real-world entity type is represented using different identifiers in different applications. Supporting real-world decisions often requires one to identify which entity in one application is the same as another in a second application. Previous research has proposed decision models to resolve this problem. However, the implementation of those models requires either the estimation of probability parameters by manually matching a large sample of the existing data or the estimation of a distance measure based on user-specified weights. This paper proposes an alternative technique based on logistic regression for estimating the matching probabilities to be used in the matching decision model. This approach has been implemented and tested on real-world data. Comparison of the results with those from earlier approaches indicate that the proposed approach performs quite well and is certainly a viable approach in practical situations.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Decision Support Systems - Volume 44, Issue 3, February 2008, Pages 740–747
نویسندگان
,