کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1150614 957960 2007 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A comparative study of the K-means algorithm and the normal mixture model for clustering: Univariate case
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات ریاضیات کاربردی
پیش نمایش صفحه اول مقاله
A comparative study of the K-means algorithm and the normal mixture model for clustering: Univariate case
چکیده انگلیسی

This paper gives a comparative study of the K-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameters of the MM model. These parameters include mixing proportions, which may be thought of as the prior probabilities of different clusters; the maximum posterior (Bayes) rule is used for clustering. Hence, asymptotically the MM method approaches the Bayes rule for known parameters, which is optimal in terms of minimizing the expected misclassification rate (EMCR).The paper gives a thorough analytic comparison of the two methods for the univariate case under both homoscedasticity and heteroscedasticity. Simulation results are given to compare the two methods for a range of sample sizes. The comparison, which is limited to two clusters, shows that the MM method has substantially lower EMCR particularly when the mixing proportions are unbalanced. The two methods have asymptotically the same EMCR under homoscedasticity (resp., heteroscedasticity) when the mixing proportions of the two clusters are equal (resp., unequal), but for small samples the MM method sometimes performs slightly worse because of the errors in estimating unknown parameters.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Statistical Planning and Inference - Volume 137, Issue 11, 1 November 2007, Pages 3722–3740
نویسندگان
, ,