Article ID Journal Published Year Pages File Type
9131911 Genomics 2005 8 Pages PDF
Abstract
Development of a robust and efficient approach for extracting useful information from microarray data continues to be a significant and challenging task. Microarray data are characterized by a high dimension, high signal-to-noise ratio, and high correlations between genes, but with a relatively small sample size. Current methods for dimensional reduction can further be improved for the scenario of the presence of a single (or a few) high influential gene(s) in which its effect in the feature subset would prohibit inclusion of other important genes. We have formalized a robust gene selection approach based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization was to exploit fully their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of feature genes) for identification of key feature genes (or molecular signatures) for a complex biological phenotype. We have applied the approach to the microarray data of diffuse large B cell lymphoma to demonstrate its behaviors and properties for mining the high-dimension data of genome-wide gene expression profiles. The resulting classifier(s) (the optimal gene subset(s)) has achieved the highest accuracy (99%) for prediction of independent microarray samples in comparisons with marginal filters and a hybrid between genetic algorithm and K nearest neighbors.
Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Genetics
Authors
, , , , , , , , , ,