Article ID Journal Published Year Pages File Type
1181628 Chemometrics and Intelligent Laboratory Systems 2008 10 Pages PDF
Abstract

Whenever regression models are optimised, it is important that all optimisation steps are properly validated. Variable selection is one example of parameter estimation that will give overly optimistic models if not included in the validation. There are many examples of reported work where the validation is performed posterior to variable selection, and many have correctly noted that these models are optimistically biased. However, if the availability of samples is limited, separation of the data into a training and validation set may decrease the quality of both the calibration model and the validation. Cross model validation is designed to validate the optimisation by including the variable selection in an extra layer of cross-validation. This means that all available samples are utilised both in the training and for estimating the residual error of the model.Cross model validation poses challenging questions both conceptually and algorithmically, and a presentation of the full work-flow is needed. We present a complete framework including optimisation, validation and calibration of bilinear regression models with variable selection. Several issues are addressed that are important for each separate stage of the analysis, and suggestions for improvements are proposed. The method is validated on a gene expression data set with a low signal-to-noise ratio and a small number of samples. It is shown that many replicates are needed to model these data properly, and that cross model validated variable selection improves both the final calibration model and the associated error estimates. A Matlab toolbox (Mathworks Inc, USA) is available from www.specmod.org.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , ,