A Tabu search based clustering algorithm and its parallel implementation on Spark

Article ID	Journal	Published Year	Pages	File Type
6904135	Applied Soft Computing	2018	29 Pages	PDF

Abstract

The well-known K-means clustering algorithm has been employed widely in different application domains ranging from data analytics to logistics applications. However, the K-means algorithm can be affected by factors such as the initial choice of centroids and can readily become trapped in a local optimum. In this paper, we propose an improved K-means clustering algorithm that is augmented by a Tabu Search strategy, and which is better adapted to meet the needs of big data applications. Our design focuses on enhancements to take advantage of parallel processing based on the Spark framework. Computational experiments demonstrate the superiority of our parallel Tabu Search based clustering algorithm over a widely used version of the K-means approach embodied in the parallel Spark MLlib system, comparing the algorithms in terms of scalability, accuracy, and effectiveness.

Keywords

Spark Tabu search Clustering Parallel computing k-Means Big Data