Article ID Journal Published Year Pages File Type
5132188 Chemometrics and Intelligent Laboratory Systems 2017 6 Pages PDF
Abstract

•Provide a 2-dimensional assessment of variable importance through Monte Carlo sampling technique.•The output of the proposed method allows for intuitive identification of important variables/genes.•Statistical test is used to assess whether effects of variables on predictive performances of models are significant or not.

Identifying a small subset of genes that can classify disease samples from healthy controls plays an import role for evaluating disease risk and facilitating diagnosis. Existing methods often provide a single metric to assess predictive performances of genes. Also, model-based gene importance is conditioned on the subset of genes used to build multivariate models, and is thus model/context-specific. Existing methods often do not take into account such context-specific effects. Here we present a novel gene selection approach that evaluates predictive performance of genes using two criteria by taking into account gene interactions and project them onto four different regions in a 2-dimensional plot, like a phase diagram (PHADIA) in chemistry. Using two publicly available microarray datasets, we showed that PHADIA achieves comparable or better classification accuracies compared to reported results in the literature. The source codes are freely available at: www.libpls.net.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , ,