کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
490198 705691 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm
چکیده انگلیسی

Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The key insight used is that for finding the cluster pair with the smallest distance, it is unnecessary to complete the computation of all cluster pairwise distances. Partial information can be utilized to calculate a lower bound on cluster pairwise distances that are subsequently used for cluster distance comparison. Our experimental results show that SparseHC achieves a linear empirical memory complexity, which is a significant improvement compared to existing algorithms.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 29, 2014, Pages 8-19