Article ID Journal Published Year Pages File Type
535006 Pattern Recognition Letters 2016 6 Pages PDF
Abstract

•We propose to modify the prior process of clusters by considering label information.•We heuristically modify the prior process of clusters considering a Polya Urn model.•We test five real datasets with 30 random-holdout obtaining good performance in comparison to other alternatives.•We analyse five real datasets respect to MCMC learning process considering log-likelihood and number of clusters.•We recommend to this variant because it has better performance that clustering based on Dirichlet Process.

Supervised clustering is an emerging area of machine learning, where the goal is to find class-uniform clusters. However, typical state-of-the-art algorithms use a fixed number of clusters. In this work, we propose a variation of a non-parametric Bayesian modeling for supervised clustering. Our approach consists of modeling the clusters as a mixture of Gaussians with the constraint of encouraging clusters of points with the same label. In order to estimate the number of clusters, we assume a-priori a countably infinite number of clusters using a variation of Dirichlet Process model over the prior distribution. In our experiments, we show that our technique typically outperforms the results of other clustering techniques.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,