کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
378781 659217 2013 22 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Empowering integration processes with data provenance
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Empowering integration processes with data provenance
چکیده انگلیسی

In some integration applications, users are allowed to import data from heterogeneous sources, but are not allowed to update these source data directly. Imported data may be inconsistent, and even when inconsistencies are detected and solved, these changes may not be propagated to the sources due to their update policies. Therefore, they continue to provide the same inconsistent data in future imports until the proper authority updates them. In this paper, we propose PrInt, a model that supports user's decisions on cleaning data to be automatically reapplied in subsequent integration processes. By reproducing previous decisions, the user may focus only on new inconsistencies originated from source modified data. The reproducibility provided by PrInt is based on logging, and by incorporating data provenance into the integration process. Other major features of PrInt are described as follows. It is based on a repository of operations, which contains provenance data and represents integration decisions that the user takes to solve attribute value conflicts among data sources. It is designed to maintain the repository consistency and to provide a strict reproduction of user's decisions by guaranteeing the validity of operations and by reapplying only valid operations. It is also designed to safely reorder the operations stored in the repository to improve the performance of the reapplication process. We applied PrInt to a real application and the experimental results showed remarkable performance gains. Reapplying user's decisions based on our model was at least 89% faster than naïvely re-executing the integration process. We conclude that the characteristics of PrInt make the integration process less error-prone and less time-consuming.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 86, July 2013, Pages 102–123
نویسندگان
, , , ,