Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
417771 | Computational Statistics & Data Analysis | 2010 | 15 Pages |
Abstract
We study a general algorithm to improve the accuracy in cluster analysis that employs the James–Stein shrinkage effect in k-means clustering. We shrink the centroids of clusters toward the overall mean of all data using a James–Stein-type adjustment, and then the James–Stein shrinkage estimators act as the new centroids in the next clustering iteration until convergence. We compare the shrinkage results to the traditional k-means method. A Monte Carlo simulation shows that the magnitude of the improvement depends on the within-cluster variance and especially on the effective dimension of the covariance matrix. Using the Rand index, we demonstrate that accuracy increases significantly in simulated data and in a real data example.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Jinxin Gao, David B. Hitchcock,