Automatic threshold estimation for data matching applications

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
394212	665785	2011	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Quality estimation - برآورد کیفیت Similarity Function - تابع شباهت Data matching - تطبیق داده ها

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Automatic threshold estimation for data matching applications

چکیده انگلیسی

Several advanced data management applications, such as data integration, data deduplication, and similarity querying rely on the application of similarity functions. A similarity function requires the definition of a threshold value in order to decide whether two different data instances match, i.e., if they represent the same real world object. In this context, threshold definition is a central problem. This paper proposes a method for estimating the quality of a similarity function. Quality is measured in terms of recall and precision calculated at several different thresholds. Based on the results of the proposed estimation process and the requirements of a specific application, a user is able to choose a suitable threshold value. The estimation process is based on a clustering phase performed over a data collection (or a sample thereof) and requires no human intervention since the choice of similarity threshold is based on the silhouette coefficient, which is an internal quality measure for clusters. An extensive set of experiments on artificial and real datasets demonstrates the effectiveness of the proposed approach. The results of the experiments show that in most cases the estimation error was below 10% in terms of precision and recall.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 181, Issue 13, 1 July 2011, Pages 2685–2699

نویسندگان

Juliana B. dos Santos, Carlos A. Heuser, Viviane P. Moreira, Leandro K. Wives,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Automatic threshold estimation for data matching applications

دسترسی سریع

ارتباط

English Website