Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1148425 | Journal of Statistical Planning and Inference | 2008 | 13 Pages |
Abstract
There has been ever increasing interest in the use of microarray experiments as a basis for the provision of prediction (discriminant) rules for improved diagnosis of cancer and other diseases. Typically, the microarray cancer studies provide only a limited number of tissue samples from the specified classes of tumours or patients, whereas each tissue sample may contain the expression levels of thousands of genes. Thus researchers are faced with the problem of forming a prediction rule on the basis of a small number of classified tissue samples, which are of very high dimension. Usually, some form of feature (gene) selection is adopted in the formation of the prediction rule. As the subset of genes used in the final form of the rule have not been randomly selected but rather chosen according to some criterion designed to reflect the predictive power of the rule, there will be a selection bias inherent in estimates of the error rates of the rules if care is not taken. We shall present various situations where selection bias arises in the formation of a prediction rule and where there is a consequent need for the correction of this bias. We describe the design of cross-validation schemes that are able to correct for the various selection biases.
Keywords
Related Topics
Physical Sciences and Engineering
Mathematics
Applied Mathematics
Authors
J.X. Zhu, G.J. McLachlan, L. Ben-Tovim Jones, I.A. Wood,