Article ID Journal Published Year Pages File Type
416567 Computational Statistics & Data Analysis 2014 9 Pages PDF
Abstract

This paper is concerned with hierarchical clustering of long binary sequence data. We propose two alternative improvements of the EM algorithm used in Chen and Lindsay (2006). One is the FixEM  . It is just the regular EM but we no longer update the weights πsπs used in the ancestral mixture models. The other is the ModalEM. In this we cluster data according to the modes of an estimated density function for the data. In order to compare these methods with each other and other popular hierarchical clustering methods, we use a data example from the international HapMap project. We compare the speed and the ability of these methods to separate out true clusters. In addition, simulation studies are performed to compare the efficiency and accuracy of these methods. Our conclusion is that the new EM methods are far superior to the original, and that both provide useful alternatives to other standard clustering methods.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,