Article ID Journal Published Year Pages File Type
849163 Optik - International Journal for Light and Electron Optics 2014 7 Pages PDF
Abstract

Clustering gene expression data is an important research topic in bioinformatics because knowing which genes act similarly can lead to the discovery of important biological information. Many clustering algorithms have been used in the field of gene clustering. The multivariate Gaussian mixture distribution function was frequently used as the component of the finite mixture model for clustering, however the clustering cannot be restricted to the normal distribution in the real dataset. In order to make the cluster algorithm strong adaptability, this paper proposes a new scheme for clustering gene expression data based on the multivariate elliptical contoured mixture models (MECMMs). To solve the problem of over-reliance on the initialization, we propose an improved expectation maximization (EM) algorithm by adding and deleting initial value for the classical EM algorithm, and the number of clusters can be treated as a known parameter and inferred with the QAIC criterion. The improved EM algorithm based on the MECMMs is tested and compared with some other clustering algorithms, the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression datasets. Our results indicated that improved EM clustering algorithm is superior to the classical EM algorithm and the support vector machines (SVMs) algorithm, and can be widely used for gene clustering.

Related Topics
Physical Sciences and Engineering Engineering Engineering (General)
Authors
, , , , ,