کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6856190 1437948 2018 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Relational data imputation with quality guarantee
ترجمه فارسی عنوان
محاسبه داده های مرتبط با تضمین کیفیت
کلمات کلیدی
عدم تقارن اطلاعات، تمیز کردن داده ها، تضمین کیفیت، وابستگی ویژگی عمومی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Missing attribute values are prevalent in real relational data, especially the data extracted from the Web. Their accurate imputation is important for ensuring high quality of data analytics. Even though many techniques have been proposed for this task, none of them provides a flexible mechanism for quality control. The lack of quality guarantee may result in many missing data being filled with wrong values, which can easily result in biased data analysis. In this paper, we first propose a novel probabilistic framework based on the concept of Generalized Feature Dependency (GFD). By exploiting the monotonicity between imputation precision and match probability, it enables a flexible mechanism for quality control. We then present the imputation model with precision guarantee and the techniques to maximize recall while meeting a user-specified precision requirement. Finally, we evaluate the performance of the proposed approach on real data. Our extensive experiments show that it has performance advantage over the state-of-the-art alternatives and most importantly, its quality control mechanism is effective.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 465, October 2018, Pages 305-322
نویسندگان
, , , ,