A comparative study of the K-means algorithm and the normal mixture model for clustering: Univariate case

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
1150614	957960	2007	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

62H30 EM algorithm - الگوریتم EM k-means algorithm - الگوریتم k-means Clustering - خوشه بندی Data mining - داده‌کاوی Mixture model - مدل مخلوط Misclassification rate - نرخ اشتباه طبقه بندی

موضوعات مرتبط

مهندسی و علوم پایه ریاضیات ریاضیات کاربردی

پیش نمایش صفحه اول مقاله

A comparative study of the K-means algorithm and the normal mixture model for clustering: Univariate case

چکیده انگلیسی

This paper gives a comparative study of the K-means algorithm and the mixture model (MM) method for clustering normal data. The EM algorithm is used to compute the maximum likelihood estimators (MLEs) of the parameters of the MM model. These parameters include mixing proportions, which may be thought of as the prior probabilities of different clusters; the maximum posterior (Bayes) rule is used for clustering. Hence, asymptotically the MM method approaches the Bayes rule for known parameters, which is optimal in terms of minimizing the expected misclassification rate (EMCR).The paper gives a thorough analytic comparison of the two methods for the univariate case under both homoscedasticity and heteroscedasticity. Simulation results are given to compare the two methods for a range of sample sizes. The comparison, which is limited to two clusters, shows that the MM method has substantially lower EMCR particularly when the mixing proportions are unbalanced. The two methods have asymptotically the same EMCR under homoscedasticity (resp., heteroscedasticity) when the mixing proportions of the two clusters are equal (resp., unequal), but for small samples the MM method sometimes performs slightly worse because of the errors in estimating unknown parameters.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Statistical Planning and Inference - Volume 137, Issue 11, 1 November 2007, Pages 3722–3740

نویسندگان

Dingxi Qiu, Ajit C. Tamhane,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A comparative study of the K-means algorithm and the normal mixture model for clustering: Univariate case

دسترسی سریع

ارتباط

English Website