Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1163144 | Analytica Chimica Acta | 2016 | 12 Pages |
•A new and efficient variable selection based on clustering of variable concept has been suggested for PLS.•Selection the most useful variable is simple and straightforward.•CLoVA concept can be used as alternative instead of using interval based variable selections for PLS.•Analyses of different data sets indicate the superiority of CLoVA respect to available variable selection algorithms.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide