Positive vectors clustering using inverted Dirichlet finite mixture models

Article ID	Journal	Published Year	Pages	File Type
388086	Expert Systems with Applications	2012	14 Pages	PDF

Abstract

In this work we present an unsupervised algorithm for learning finite mixture models from multivariate positive data. Indeed, this kind of data appears naturally in many applications, yet it has not been adequately addressed in the past. This mixture model is based on the inverted Dirichlet distribution, which offers a good representation and modeling of positive non-Gaussian data. The proposed approach for estimating the parameters of an inverted Dirichlet mixture is based on the maximum likelihood (ML) using Newton Raphson method. We also develop an approach, based on the minimum message length (MML) criterion, to select the optimal number of clusters to represent the data using such a mixture. Experimental results are presented using artificial histograms and real data sets. The challenging problem of software modules classification is investigated within the proposed statistical framework, also.

► An algorithm for estimating finite inverted Dirichlet mixture parameters is proposed. ► An approach for model selection using minimum message length is developed. ► The model is applied to the challenging problem of software modules categorization.

Keywords

MML Maximum likelihood Data clustering Mixture models Unsupervised learning