Leukemia and small round blue-cell tumor cancer detection using microarray gene expression data set: Combining data dimension reduction and variable selection technique

Article ID	Journal	Published Year	Pages	File Type
1180637	Chemometrics and Intelligent Laboratory Systems	2014	9 Pages	PDF

Abstract

•A new method was suggested for analyzing high dimensional data set (gene expression microarray).•The proposed method can solve the small sample size problem in LDA works based on the clustering of variable (CLoVA) concept.•CLoVA concept combined with PCA was used as an efficient method for data dimension reduction.•The method was applied successfully in Leukemia and small round blue-cell tumor cancer detection.

Using gene expression data in cancer classification plays an important role for solving the fundamental problems relating to cancer diagnosis. Because of high throughput of gene expression data for healthy and patient samples, a variable selection method can be applied to reduce complexity of the model and improve the classification performance. Since variable selection procedures pose a risk of over-fitting, when a large number of variables with respect to sample are used, we have proposed a method for coupling data dimension reduction and variable selection in the present study. This approach uses the concept of variable clustering for the original data set. Significant components of local principal component analysis models have just been retained from all clusters. Then, the variable selection algorithm is performed on these locally derived principal component variables. The proposed algorithm has been evaluated on two gene expression data sets; namely, acute Leukemia and small round blue-cell tumor (SRBCT). Our results confirmed that the classification models achieved on the reduced data were better than those obtained on the entire microarray gene expression profile.

Keywords

Microarray gene expression Linear discriminant analysis, LDA Genetic algorithm Kohonen self-organizing map