Article ID Journal Published Year Pages File Type
4948259 Neurocomputing 2017 16 Pages PDF
Abstract

Many supervised clustering algorithms have been developed to find the optimal clusters for static datasets by presetting some parameters, but they are seldom suitable for dynamic datasets, such as the data stream of chunks. To find the optimal clusters of the data stream of chunks, a novel Supervised Adaptive Incremental Clustering (SAIC) algorithm is proposed. SAIC can cluster dynamic datasets of arbitrary shapes and sizes automatically. It includes learning and post-processing phases. In the learning phase, each cluster updates adaptively according to its learning rate that is calculated from its counter value. All data points are shuffled at each iteration in order to make SAIC insensitive to the input order of data points. In the post-processing phase, the outliers or boundary points are eliminated according to the counter value of each cluster and the number of iterations. Four synthetic datasets and fourteen UCI datasets are used to evaluate the performance of SAIC, respectively. The experiments on UCI datasets show that SAIC reaches to or outperforms some other supervised clustering algorithms and several unsupervised incremental clustering algorithms. In addition, three data stream of chunks are used to evaluate SAIC from different aspects, which shows SAIC has the scalability and incremental learning ability for the clustering of data streams of chunks.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,