کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
416567 | 681384 | 2014 | 9 صفحه PDF | دانلود رایگان |
This paper is concerned with hierarchical clustering of long binary sequence data. We propose two alternative improvements of the EM algorithm used in Chen and Lindsay (2006). One is the FixEM . It is just the regular EM but we no longer update the weights πsπs used in the ancestral mixture models. The other is the ModalEM. In this we cluster data according to the modes of an estimated density function for the data. In order to compare these methods with each other and other popular hierarchical clustering methods, we use a data example from the international HapMap project. We compare the speed and the ability of these methods to separate out true clusters. In addition, simulation studies are performed to compare the efficiency and accuracy of these methods. Our conclusion is that the new EM methods are far superior to the original, and that both provide useful alternatives to other standard clustering methods.
Journal: Computational Statistics & Data Analysis - Volume 74, June 2014, Pages 17–25