کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
397518 671267 2009 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A strategy for allowing meaningful and comparable scores in approximate matching
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A strategy for allowing meaningful and comparable scores in approximate matching
چکیده انگلیسی

Approximate data matching aims at assessing whether two distinct instances of data represent the same real-world object. The comparison between data values is usually done by applying a similarity function which returns a similarity score. If this score surpasses a given threshold, both data instances are considered as representing the same real-world object. These score values depend on the algorithm that implements the function and have no meaning to the user. In addition, score values generated by different functions are not comparable. This will potentially lead to problems when the scores returned by different similarity functions need to be combined for computing the similarity between records. In this article, we propose that thresholds should be defined in terms of the precision that is expected from the matching process rather than in terms of the raw scores returned by the similarity function. Precision is a widely known similarity metric and has a clear interpretation from the user's point of view. Our approach defines mappings from score values to precision values, which we call adjusted scores. In order to obtain such mappings, our approach requires training over a small dataset. Experiments show that training can be reused for different datasets on the same domain. Our results also demonstrate that existing methods for combining scores for computing the similarity between records may be enhanced if adjusted scores are used.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 34, Issue 8, December 2009, Pages 673–689
نویسندگان
, , , , , ,