Clustering nominal data using unsupervised binary decision trees: Comparisons with the state of the art methods

Article ID	Journal	Published Year	Pages	File Type
4969775	Pattern Recognition	2017	30 Pages	PDF

Abstract

In this work, we propose an extension of CUBT (clustering using unsupervised binary trees) to nominal data. For this purpose, we primarily use heterogeneity criteria and dissimilarity measures based on mutual information, entropy and Hamming distance. We show that for this type of data, CUBT outperforms most of the existing methods. We also provide and justify some guidelines and heuristics to tune the parameters in CUBT. Extensive comparisons are done with other well known approaches using simulations, and two examples of real datasets applications are given.

Keywords

68T10 62H30 Entropy Mutual information Clustering Nominal data Unsupervised learning