Article ID Journal Published Year Pages File Type
1179509 Chemometrics and Intelligent Laboratory Systems 2015 13 Pages PDF
Abstract

While partial least squares (PLS) and principal component regression (PCR), the most popular regression techniques in chemometrics, may theoretically be able to deal with large numbers of possibly correlated variables, as occurring in the analysis of spectroscopic data, the importance of performing some form of variable selection in practical applications has been widely discussed and acknowledged. In this work we address this problem via proposing a sparse regression algorithm, referred to as fused stagewise regression (FSR), which iteratively performs a selection of connected regions of variables (wavelengths), while being quite easy to implement and interpret, due to its resemblance to typical steps in iterative manual feature selection procedures. We evaluate the proposed variable selection technique on a publicly available benchmark data set and compare the performance of PLS models built on the determined selection to ones obtained by state-of-the-art feature selection methods from the fields of chemometrics and machine learning. In order to ensure robust feature selection, we integrate the individual selection methods into an extensive repeated cross validation procedure. For the data set under investigation, it is shown that FSR performs at least as good as state-of-the-art approaches and well within the range of variable selections provided by experts.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, ,