Tree-based ensemble methods and their applications in analytical chemistry

Article ID	Journal	Published Year	Pages	File Type
1249200	TrAC Trends in Analytical Chemistry	2012	10 Pages	PDF

Abstract

Large amounts of data from high-throughput analytical instruments have generally become more and more complex, bringing a number of challenges to statistical modeling. To understand complex data further, new statistically-efficient approaches are urgently needed to:(1)select salient features from the data;(2)discard uninformative data;(3)detect outlying samples in data;(4)visualize existing patterns of the data;(5)improve the prediction accuracy of the data; and, finally,(6)feed back to the analyst understandable summaries of information from the data.We review current developments in tree-based ensemble methods to mine effectively the knowledge hidden in chemical and biology data. We report on applications of these algorithms to variable selection, outlier detection, supervised pattern analysis, cluster analysis, and tree-based kernel and ensemble learning.Through this report, we wish to inspire chemists to take greater interest in decision trees and to obtain greater benefits from using the tree-based ensemble techniques.

► Decision tree can do automatic stepwise variable selection and complexity reduction. ► Decision trees can cope effectively with complex chemical data. ► Tree-based ensemble approaches could be applied to solve various chemometric problems.

Keywords

Ensemble algorithm Cluster analysis Variable selection Pattern analysis Outlier detection Complex data kernel method Classification and regression tree (CART)Chemometrics