Article ID Journal Published Year Pages File Type
378870 Data & Knowledge Engineering 2013 21 Pages PDF
Abstract

The concept of matching dependencies (mds) has recently been proposed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), mds can also be applied to various data quality applications such as detecting the violations of integrity constraints. In this paper, we study the problem of discovering similarity constraints for matching dependencies from a given database instance. First, we introduce the measures, support and confidence, for evaluating the utility of mds in the given data. Then, we study the discovery of mds with certain utility requirements of support and confidence. Exact algorithms are developed, together with pruning strategies to improve the time performance. Since the exact algorithm has to traverse all the data during the computation, we propose an approximate solution which only uses part of the data. A bound of relative errors introduced by the approximation is also developed. Finally, our experimental evaluation demonstrates the efficiency of the proposed methods.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,