کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10225731 1701206 2018 32 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning
چکیده انگلیسی
The majority of the clinical observation data stored in large-scale Electronic Health Record (EHR) research data networks are unlabeled. Unsupervised clustering can provide invaluable tools for studying patient sub-groups in these data. Many of the popular unsupervised clustering algorithms are dependent on identifying the number of clusters. Multiple statistical methods are available to approximate the number of clusters in a dataset. However, available methods are computationally inefficient when applied to large amounts of data. Scalable analytical procedures are needed to extract knowledge from large clinical datasets. Using both simulated, clinical, and public data, we developed and tested the kluster procedure for approximating the number of clusters in a large clinical dataset. The kluster procedure iteratively applies four statistical cluster number approximation methods to small subsets of data that were drawn randomly with replacements and recommends the most frequent and mean number of clusters resulted from the iterations as the potential optimum number of clusters. Our results showed that the kluster's most frequent product that iteratively applies a model-based clustering strategy using Bayesian Information Criterion (BIC) to samples of 200-500 data points, through 100 iterations, offers a reliable and scalable solution for approximating the number of clusters in unsupervised clustering. We provide the kluster procedure as an R package.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Big Data Research - Volume 13, September 2018, Pages 38-51
نویسندگان
, , ,