Article ID Journal Published Year Pages File Type
5132295 Chemometrics and Intelligent Laboratory Systems 2017 15 Pages PDF
Abstract

•It is introduced a statistic (P-PLS) based onthe PRESS statistic for PLS regression.•The proposed method is illustrated by itsapplication to four real-world data sets.•This study contributes by providing new elementsfor future theoretical development in PLS regression.

Choosing the right number of latent factors to be used in PLS regression (Partial Least Squares Regression) has been a matter of concern among users, academics and researchers. In this paper, we introduce a statistic to select the appropriate number of latent factors according to the model predictive ability. This method is based on the Predicted Residual Error Sum of Squares (PRESS) for PLS regression. Our mathematical development is based on matrix calculations obtained from the orthogonal vectors that compose the matrix of latent factors. Currently, the leave-one-out method is widely used for this, where one observation is left out and then the regression model is estimated. This technique is repeated as many times as the number of observations. The advantage of using the PRESS statistic for PLS regression (P-PLS), developed in this work, is to have the possibility of selecting the best predictive model straightforwardly. Additionally, the P-PLS can be used for analyzing the impact caused by the ith observation on the PLS regression vector of coefficients, as well as for detecting other kinds of data that affect the model.

Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , , ,