Article ID Journal Published Year Pages File Type
484823 Procedia Computer Science 2015 9 Pages PDF
Abstract

Data clustering has been widely applied to many practical applications, including social network analysis, location-based analysis, and scientific analysis. Considering the ongoing huge increases in the amount of data generated by Internet users and limited memory and computation resources, the design of scalable clustering has become very significant. In this paper, we propose a distributed clustering algorithm named HiClus, which is based on heterogeneous cloud computing. HiClus can efficiently build an adaptive distributed tree in the cloud to utilize computation resources of both central processing unit and general-purpose computing on graphics processing units. The evaluation herein shows that HiClus can scale up better, use less clustering time, and achieve better load balancing than existing MapReduce algorithms.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)