کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425913 685951 2014 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Shortest-linkage-based parallel hierarchical clustering on main-belt moving objects of the solar system
ترجمه فارسی عنوان
خوشه بندی سلسله مراتبی موازی مبتنی بر کوتاه ترین بر روی اشیاء متحرک خورشیدی در کمربندهای اصلی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• We parallelize traditional hierarchical clustering based on shortest linkage.
• We use two real datasets, 550,000 and 300,000 objects, for performance evaluations.
• The results show that our pruning strategy reduces execution time and storage usage.
• Our fast update algorithm adds an object into a tree of 550,000 objects in seconds.

Data clustering is an important data preparation process in many scientific analysis researches. In astronomy, although the distributed environments and modern observation techniques enable users to collect and access huge amounts of data, the corresponding clustering process may become very costly. One of the challenges is that the sequential clustering algorithms, that can be applied to cluster hundreds of thousand main-belt asteroids to reason about the origins of the main-belt asteroids, may not be used in the distributed environment directly. Therefore, this study focuses on the problem of parallelizing the traditional hierarchical agglomerative clustering algorithm using shortest-linkage. We propose a new parallel hierarchical agglomerative clustering algorithm based on the master–worker model. The master process divides the whole computation into several small tasks, and distributes the tasks to the worker processes for parallel processing. Then, the master process merges the results from the worker processes to form a hierarchical data structure. The proposed algorithm uses a pruning threshold to reduce the execution time and the storage requirement during the computation. It also supports fast incremental update that merges new data items into a constructed hierarchical tree in seconds, given a tree of about 550,000 data items. To evaluate the performance of our algorithm, this study has conducted several experiments using the MPCORB dataset and a dataset from the DVO database. The results confirm the efficiency of our proposed methodology. Compared with prior similar studies, the proposed algorithm is more flexible and practical in the problem of distributed hierarchical agglomerative clustering.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 34, May 2014, Pages 26–46
نویسندگان
, , , , ,