Article ID Journal Published Year Pages File Type
489070 Procedia Computer Science 2011 6 Pages PDF
Abstract

Clustering is a way of classifying a multi-dimensional dataset by the similarities of its dimensions. The results from clustering must be analyzed to test the accuracy of the algorithm and its implementation. This analysis is sometimes done by a visual representation of the clustered dataset. However, it is impossible to visually represent a dataset with more than four dimensions. Statistical analysis makes this feasible. The analysis performed on the output calculates the centroid of each cluster and the cluster's relation to that centroid. We have investigated two modes of hierarchical clustering and spectral clustering. The standard deviation of each dimension from the centroid, the maximum Euclidean distance from the centroid, and the dimensions that elements of each cluster have in common are also computed. The performed experiments demonstrate which clustering algorithm presents most accurate results under certain circumstances through the use of a synthesis of visual representation and the statistical analysis proposed above.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)