کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10366957 872885 2005 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A formal framework for database sampling
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر تعامل انسان و کامپیوتر
پیش نمایش صفحه اول مقاله
A formal framework for database sampling
چکیده انگلیسی
Database sampling is commonly used in applications like data mining and approximate query evaluation in order to achieve a compromise between the accuracy of the results and the computational cost of the process. The authors have recently proposed the use of database sampling in the context of populating a prototype database, that is, a database used to support the development of data-intensive applications. Existing methods for constructing prototype databases commonly populate the resulting database with synthetic data values. A more realistic approach is to sample a database so that the resulting sample satisfies a predefined set of integrity constraints. The resulting database, with domain-relevant data values and semantics, is expected to better support the software development process. This paper presents a formal study of database sampling. A Denotational Semantics description of database sampling is first discussed. Then the paper characterises the types of integrity constraints that must be considered during sampling. Lastly, the sampling strategy presented here is applied to improve the data quality of a (legacy) database. In this context, database sampling is used to incrementally identify the set of tuples which are the cause of inconsistencies in the database, and therefore should be the ones to be addressed by the data cleaning process.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information and Software Technology - Volume 47, Issue 12, 1 September 2005, Pages 819-828
نویسندگان
, , ,