Article ID Journal Published Year Pages File Type
1180307 Chemometrics and Intelligent Laboratory Systems 2016 6 Pages PDF
Abstract

•The Hotelling T2 based variable selection in partial least squares regression.•Implementation of methods over chemometric data sets.•Stable parameter estimation through Monte Carlo simulation.

The enhancement in technology results in a massive amount of data, where variables are usually correlated and larger than the number of samples. Among these large number of variables only the small subset of these variables are assumed to be informative. Hence, the selection of informative variables is a crucial issue. Partial least squares (PLS) based variable selection is the one possibility to model multivariate data. The variable selection usually targets the improved prediction together and additionally provides the better understanding of the fitted model. This article aims to provide a simpler model with acceptable prediction by improving the existing method called distribution based truncation in PLS (Trunc-PLS). It is based on a certain PLS loading weight. Instead of this, we have considered the loading weight matrix which comes from optimum PLS model. The elimination of PLS loading weights is achieved through the multivariate distribution Hotelling T2; utilizes the information from all used PLS components. Moreover, we have implemented the theory of elimination of variable in PLS over the novel applications: a) modeling iodine value with the Raman spectra from of Salmon fish oil; b) modeling percentage of cow milk in the mixture with the matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) spectra. Further more, comparison of the proposed method is carried out with the existing PLS methods. The comparison demonstrates better prediction ability of proposed method over both data sets with smaller number of selected variables, which guarantees improved understandability.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
,