Article ID Journal Published Year Pages File Type
7562970 Chemometrics and Intelligent Laboratory Systems 2015 8 Pages PDF
Abstract
In the field of chemometrics and other areas of data analysis the development of new methods for statistical inference and prediction is the focus of many studies. The requirement to document the properties of new methods is inevitable, and often simulated data are used for this purpose. However, when it comes to simulating data there are few standard approaches. In this paper we propose a very transparent and versatile method for simulating response and predictor data from a multiple linear regression model which hopefully may serve as a standard tool simulating linear model data. The approach uses the principle of a relevant subspace for prediction, which is known both from Partial Least Squares and envelope models, and is essentially based on a re-parametrization of the random x regression model. The approach also allows for defining a subset of relevant observable predictor variables spanning the relevant latent subspace, which is handy for exploring methods for variable selection. The data properties are defined by a small set of input-parameters defined by the analyst. The versatile approach can be used to simulate a great variety of data with varying properties in order to compare statistical methods. The method has been implemented in an R-package and its use is illustrated by examples.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , ,