Evaluation of classification quality and comparative analysis of clustering and self-organization

Article ID	Journal	Published Year	Pages	File Type
489070	Procedia Computer Science	2011	6 Pages	PDF

Abstract

Clustering is a way of classifying a multi-dimensional dataset by the similarities of its dimensions. The results from clustering must be analyzed to test the accuracy of the algorithm and its implementation. This analysis is sometimes done by a visual representation of the clustered dataset. However, it is impossible to visually represent a dataset with more than four dimensions. Statistical analysis makes this feasible. The analysis performed on the output calculates the centroid of each cluster and the cluster's relation to that centroid. We have investigated two modes of hierarchical clustering and spectral clustering. The standard deviation of each dimension from the centroid, the maximum Euclidean distance from the centroid, and the dimensions that elements of each cluster have in common are also computed. The performed experiments demonstrate which clustering algorithm presents most accurate results under certain circumstances through the use of a synthesis of visual representation and the statistical analysis proposed above.