کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431865 688642 2013 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fault tolerant decentralised KK-Means clustering for asynchronous large-scale networks
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Fault tolerant decentralised KK-Means clustering for asynchronous large-scale networks
چکیده انگلیسی

The KK-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the KK-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed KK-Means algorithm (Epidemic  KK-M  eans) which does not require global communication and is intrinsically fault tolerant. The proposed distributed KK-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.


► A fully decentralised KK-Means clustering algorithm for asynchronous networks.
► The algorithm does not require global communication and is fault tolerant.
► It approximates the centralised KK-Means algorithm as closely as desired.
► Tested with data whose cluster distributions vary from uniform to highly skewed.
► Tested under network failures (message loss) and node churn.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 3, March 2013, Pages 317–329
نویسندگان
, , , ,