Model selection for partial least squares based dimension reduction

Article ID	Journal	Published Year	Pages	File Type
535836	Pattern Recognition Letters	2012	6 Pages	PDF

Abstract

Partial least squares (PLS) has been widely applied to process scientific data sets as an effective dimension reduction technique. The main way to determine the number of dimensions extracted by PLS is by using the cross validation method, but its computation load is heavy. Researchers presented fixing the number at three, but intuitively it’s not suitable for all data sets. Based on the intrinsic connection between PLS and the structure of data sets, two novel algorithms are proposed to determine the number of extracted principal components, keeping the valuable information while excluding the trivial. With the merits of variety with different data sets and easy implementation, both algorithms exhibit better performance than the previous works on nine real world data sets.

► Model selection for partial least squares (PLS) based dimension reduction is studied. ► The main way is cross validation, but its computation load is heavy. ► Fixing the number of principal components at three is not proper for all data sets. ► Two novel algorithms are proposed based on the intrinsic structure of PLS. ► They exhibit better performance than the previous works on nine scientific data sets.

Keywords

Model selection Partial least squares Dimension reduction