Article ID Journal Published Year Pages File Type
1243684 Talanta 2009 8 Pages PDF
Abstract

Microarrays are used to simultaneously determine the expressions of thousands of genes. An important application of microarrays is in the classification of samples into classes of interest (e.g. either healthy cells or tumour cells). Discriminant partial least squares (DPLS) has often been used for this purpose. In this paper, we describe an improvement to DPLS that uses kernel-based probability density functions and the Bayes rule to classify samples whilst keeping the option of not classifying the sample if this cannot be done with sufficient confidence. With this approach, those samples outside the boundaries of the known classes or from the ambiguity region between classes are rejected and only samples with a high probability of being correctly classified are indeed classified. The optimal model is found by simultaneously minimizing the misclassification and rejection costs. The method (p-DPLS with reject option) was tested with two datasets. For the human cancers dataset the accuracy (obtained by leave-one-out cross-validation) was improved from 97% to 99% when compared to p-DPLS without reject option. For the breast cancer dataset, p-DPLS with reject option was able to reject 100% of the test samples that did not belong to any of the modelled classes. These samples would have been misclassified if the reject option had not been considered.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , ,