Gene and sample selection for cancer classification with support vectors based t-statistic

Article ID	Journal	Published Year	Pages	File Type
410363	Neurocomputing	2010	10 Pages	PDF

Abstract

T-statistic is widely used for gene ranking in the analysis of microarray gene expressions. Such a filter based criterion is generally computed using all the training samples, all of which, however, may not be equally important for classification task. In this paper, we decompose the t-statistic into two parts, corresponding to relevant and irrelevant data points. The relevant data points are selected using support vectors and then used to compute t-statistic for feature selection. By simultaneously selecting data points and genes, significantly better classification results are achieved on synthetic as well as on several benchmark cancer datasets.

Keywords

SVM-RFE Instance Selection Feature selection Wrapper Filter