Article ID Journal Published Year Pages File Type
393897 Information Sciences 2013 19 Pages PDF
Abstract

In this paper we give a general definition for the concept ‘optimal clustering’ which is applicable to overlapping clusterings. Overlapping clusterings are a generalization of hard clusterings and their structure is formally developed in this paper. It is generally assumed that the domain of clustering is too heuristic to develop a general, i.e. axiomatic, definition for an optimal clustering. It is shown, however, that such a definition can be given within the domain of overlapping clusterings, using the new concept of dual clustering developed in this paper. A second concept that underlies our definition of optimal clustering is the average clustering, also playing an important role in the domain of cluster ensembles. Using the general concepts discussed in this paper, it is then shown that under some conditions it is assured that the final hard clustering extracted by majority vote from a given set of clusterings, is optimal over all hard clusterings.Unlike traditional research related to validating clusterings, we do not develop a new cluster validation measure on top of the many existing ones, but rather we develop a general framework for cluster validation measures, at least within the domain of overlapping clusterings. This framework allows to develop some general theorems about clustering.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
,