کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6868365 1439958 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Variations on the Clustering Algorithm BIRCH
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Variations on the Clustering Algorithm BIRCH
چکیده انگلیسی
Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms suffer from two drawbacks: they do not scale well with increasing dataset sizes and often require proper parametrization which is usually difficult to provide. A very important example is the cluster count, a parameter that in many situations is next to impossible to assess. In this paper we present A-BIRCH, an approach for automatic threshold estimation for the BIRCH clustering algorithm. This approach computes the optimal threshold parameter of BIRCH from the data, such that BIRCH does proper clustering even without the global clustering phase that is usually the final step of BIRCH. This is possible if the data satisfies certain constraints. If those constraints are not satisfied, A-BIRCH will issue a pertinent warning before presenting the results. This approach renders the final global clustering step of BIRCH unnecessary in many situations, which results in two advantages. First, we do not need to know the expected number of clusters beforehand. Second, without the computationally expensive final clustering, the fast BIRCH algorithm will become even faster. For very large data sets, we introduce another variation of BIRCH, which we call MBD-BIRCH, which is of particular advantage in conjunction with A-BIRCH but is independent from it and also of general benefit.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Big Data Research - Volume 11, March 2018, Pages 44-53
نویسندگان
, , , , , ,