Article ID Journal Published Year Pages File Type
6856461 Information Sciences 2018 22 Pages PDF
Abstract
The choice of the number of clusters (k) remains challenging for clustering methods. Instead of determining k, the implicit parallelism feature of evolutionary multi-objective optimization (EMO) provides an effective and efficient paradigm to find the optimal clustering in a posteriori manner. That is, first EMO algorithms are employed to search for a set of non-dominated solutions, representing different clustering results with different k. Then, a certain validity index is used to select the optimal clustering result. This study systematically investigates the use of EMO for multi-clustering (i.e., searching for multiple clustering simultaneously). An effective bi-objective model is built wherein the number of clusters and the sum of squared distances (SSD) between data points and their cluster centroids are considered as objectives. To ensure the two objectives are conflicting with each other, a novel transformation strategy is applied to the SSD. Then, the model is solved by an EMO algorithm. The derived paradigm, EMO-k-clustering, is examined on three datasets of different properties where NSGA-II serves as the EMO algorithm. Experimental results show that the proposed bi-objective model is effective. EMO-k-clustering is able to efficiently obtain all the clustering results for different k values in its single run.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , , ,