A novel kernel Fisher discriminant analysis: Constructing informative kernel by decision tree ensemble for metabolomics data analysis

Article ID	Journal	Published Year	Pages	File Type
1166496	Analytica Chimica Acta	2011	8 Pages	PDF

Abstract

Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights► We developed a novel KFDA algorithm by constructing an informative kernel based on decision tree ensemble. ► The kernel can encode the similarities of metabolomics samples between biomarkers space. ► KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel. ► Two real datasets together with a simulated data demonstrated the performance of KFDA.

Keywords

Decision tree Kernel methods Classification and regression tree (CART)Metabolomics Biomarker discovery