An efficient and scalable family of algorithms for combining clusterings

Article ID	Journal	Published Year	Pages	File Type
380810	Engineering Applications of Artificial Intelligence	2013	15 Pages	PDF

Abstract

•We propose a novel family of algorithms for improving the accuracy of clustering by combining multiple clusterings.•Our methods are fast, accurate and tremendously efficient both in terms of memory and CPU.•Our methods are faster and more accurate than the state of art methods.•Evaluations on some very challenging real data sets establish the merit of our methods.

Clustering is the process of grouping objects that are similar, where similarity between objects is usually measured by a distance metric. The groups formed by a clustering method are referred as clusters. Clustering is a widely used activity with multiple applications ranging from biology to economics. Each clustering technique has some advantages and disadvantages. Some clustering algorithms may even require input parameters which strongly affect the result. In most cases, it is not possible to choose the best distance metric, the best clustering method, and the best input argument values for an input data set. Therefore, multiple clusterings can be obtained by several distance metrics, several clustering methods, and several input argument values. And, multiple clusterings can be combined into a new and better quality final clustering. We propose a family of combining multiple clustering algorithms that are memory efficient, scalable, robust, and intuitive. Our new algorithms offer tremendous speed gain and low memory requirements by working at cluster level, while producing very good quality final clusters. Extensive experimental evaluations on some very challenging artificially generated and real data sets from a diverse set of domains establish the usefulness of our methods.

Keywords

Fast clustering Cluster ensemble Clustering