کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
384319 660843 2010 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Automatic accuracy assessment via hashing in multiple-source environment
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Automatic accuracy assessment via hashing in multiple-source environment
چکیده انگلیسی

Accuracy is a most important data quality dimension and its assessment is a key issue in data management. Most of current studies focus on how to qualitatively analyze accuracy dimension and the analysis depends heavily on experts’ knowledge. Seldom work is given on how to automatically quantify accuracy dimension. Based on Jensen–Shannon divergence (JSD) measure, we propose accuracy of data can be automatically quantified by comparing data with its entity’s most approximation in available context. To quickly identify most approximation in large scale data sources, locality-sensitive hashing (LSH) is employed to extract most approximation at multiple levels, namely column, record and field level. Our approach can not only give each data source an objective accuracy score very quickly as long as context member is available but also avoid human’s laborious interaction. As an automatic accuracy assessment solution in multiple-source environment, our approach is distinguished, especially for large scale data sources. Theory and experiment show our approach performs well in achieving metadata on accuracy dimension.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 37, Issue 3, 15 March 2010, Pages 2609–2620
نویسندگان
, , ,