Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4969775 | Pattern Recognition | 2017 | 30 Pages |
Abstract
In this work, we propose an extension of CUBT (clustering using unsupervised binary trees) to nominal data. For this purpose, we primarily use heterogeneity criteria and dissimilarity measures based on mutual information, entropy and Hamming distance. We show that for this type of data, CUBT outperforms most of the existing methods. We also provide and justify some guidelines and heuristics to tune the parameters in CUBT. Extensive comparisons are done with other well known approaches using simulations, and two examples of real datasets applications are given.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Badih Ghattas, Pierre Michel, Laurent Boyer,