کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
474216 698850 2007 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A branch-and-cut algorithm for the continuous error localization problem in data cleaning
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
A branch-and-cut algorithm for the continuous error localization problem in data cleaning
چکیده انگلیسی

Data collected by statistical agencies may contain mistakes made during the acquisition, transcription and coding process. Thus, before using all this information to infer statistical properties, the statistical agencies must check the consistency of their information. To this end, each record has to be tested on a set of consistency rules. Therefore, if one of these records does not meet all the consistency rules, it must be established which fields have to be modified in order to make the new record valid with respect to that set of consistency rules. Among all the possible solutions, statistical agencies are interested in finding one in which the number of fields that should be modified is minimum, thus leading to a combinatorial optimization problem known as the Error Localization Problem. This article approaches the optimization problem of finding the smallest set of fields whose values must be changed in order to satisfy a given set of consistency rules. With this purpose in mind an Integer Linear Programming model is proposed for the particular case in which the fields are continuous values and the consistency rules are given by linear inequalities. This model is solved through a branch-and-cut approach based on a Benders’ decomposition. The new proposal is compared to others previously published in the literature and tested on benchmark instances. The overall performance of these new algorithms succeeded in solving to optimality difficult instances with up to 100 fields in a record in about 1 min of a personal computer.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Operations Research - Volume 34, Issue 9, September 2007, Pages 2790–2804
نویسندگان
, ,