Article ID Journal Published Year Pages File Type
552888 Decision Support Systems 2008 8 Pages PDF
Abstract

This paper examines a widely-encountered data heterogeneity problems—often faced in real-world decision support situations—called the entity heterogeneity problem, which arises when the same real-world entity type is represented using different identifiers in different applications. Supporting real-world decisions often requires one to identify which entity in one application is the same as another in a second application. Previous research has proposed decision models to resolve this problem. However, the implementation of those models requires either the estimation of probability parameters by manually matching a large sample of the existing data or the estimation of a distance measure based on user-specified weights. This paper proposes an alternative technique based on logistic regression for estimating the matching probabilities to be used in the matching decision model. This approach has been implemented and tested on real-world data. Comparison of the results with those from earlier approaches indicate that the proposed approach performs quite well and is certainly a viable approach in practical situations.

Related Topics
Physical Sciences and Engineering Computer Science Information Systems
Authors
,