Article ID Journal Published Year Pages File Type
1181645 Chemometrics and Intelligent Laboratory Systems 2008 7 Pages PDF
Abstract
Unsupervised cluster analysis is frequently used to explore the structure of large datasets. This paper introduces a new method, DiPCA_Cluster, which is an optimal alternative to DiPLS_Cluster with two additional refinements. Indeed, DiPLS_Cluster is not optimal as it does not converge to a unique solution (it depends on the algorithm initialization) and moreover, provides dendrograms which are not very meaningful. The method proposed in this paper, DiPCA_Cluster is optimal with regards to the chosen criterion, inertia. Furthermore, at each step of the analysis, the data have to be split into two groups thanks to the values of a 0/1 vector. In our method, unlike DiPLS_Cluster, this split is based on the use of a threshold value which is more in accordance with the general problem, in this goal, two proposals are suggested. Finally, whereas DiPLS_Cluster set up the dendrogram by using the number of iterations, DiPCA_Cluster uses the total variance explained by each principal component within each group which is more justified and which allows to find out dendrograms very close to the ones provided by Ward criterion.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , ,