کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
426187 686009 2011 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Mining massive datasets by an unsupervised parallel clustering on a GRID: Novel algorithms and case study
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Mining massive datasets by an unsupervised parallel clustering on a GRID: Novel algorithms and case study
چکیده انگلیسی

This paper proposes three novel parallel clustering algorithms based on the Kohonen’s SOM aiming at preserving the topology of the original dataset for a meaningful visualization of the results and for discovering associations between features of the dataset by topological operations over the clusters. In all these algorithms the data to be clustered are subdivided among the nodes of a GRID. In the first two algorithms each node executes an on-line SOM, whereas in the third algorithm the nodes execute a quasi-batch SOM called MANTRA. The algorithms differ on how the weights computed by the slave nodes are recombined by a master to launch the next epoch of the SOM in the nodes. A proof outline demonstrates the convergence of the proposed parallel SOMs and provides indications on how to select the learning rate to outperform both the sequential SOM and the parallel SOMs available in the literature. A case study dealing with bioinformatics is presented to illustrate that by our parallel SOM we may obtain meaningful clusters in massive data mining applications at a fraction of the time needed by the sequential SOM, and that the obtained classification supports a fruitful knowledge extraction from massive datasets.


► A novel parallel clustering method is proposed for accelerating the Self Organizing Map (SOM) and preserving the topology of the original dataset.
► A proof outline demonstrates the convergence of the proposed parallel SOM.
► Heuristics are provided to maximize the speed-up and to minimize the topological error.
► Simulations show that the parallel SOM outperforms the currently available methods for parallelizing the SOM.
► A case study in bioinformatics illustrates how the method assists in discovering associations between data features from the topological properties of the clusters.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 27, Issue 6, June 2011, Pages 711–724
نویسندگان
, , ,