کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1149647 957891 2012 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Data cloning: Data visualisation, smoothing, confidentiality, and encryption
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات ریاضیات کاربردی
پیش نمایش صفحه اول مقاله
Data cloning: Data visualisation, smoothing, confidentiality, and encryption
چکیده انگلیسی

One simple way to change data for simple linear regression and still get the same fitted parameters is to add each of the residuals from the first model fit to each original observation. For n initial data points (x, y) this creates n2 observations. More generally, adding {ai: i=1,…,m} to each observation produces mn new observations with the same simple linear regression fit, provided the sum over i of the ai is zero. An alternative method, after mean adjustment, is to regress y   on xx and x on y, and use the predicted values ŷ and xˆ as new data; the regression for ŷ on xˆ, and for y on x are identical as are the correlations between x and y  , and xˆ and ŷ. The underlying principle can be extended to simple linear regression with intercept, to multiple linear regression, and to situations where the design matrix is not full rank and/or the data are not independent and identically distributed. For multiple linear regression, the procedure can be repeated many times, each time producing a new dataset with the same multiple linear regression fit as the original data. We call these datasets “cloned” or “matching”. One major advantage of such datasets is that, unlike the more usual model-based alternatives, parameter estimates of the original data and the cloned data are identical and include no model error. Data cloning consequently has potential uses in a wide range of applications from confidentialising or encrypting data, to data visualisation and smoothing. The encryption application is particularly interesting because it can be applied generally to databases even where there is no interest in regression modelling.


► An algorithm is given that generates cloned data with the same multiple regression.
► Many cloned datasets can be created, each with identical parameter estimates.
► Cloned has a wide range of uses, even if regression is not the main interest.
► Cloning has potential for confidentialising and encrypting data.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Statistical Planning and Inference - Volume 142, Issue 2, February 2012, Pages 410–422
نویسندگان
, ,