کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
5129322 1489639 2017 22 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Feature screening in large scale cluster analysis
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات آنالیز عددی
پیش نمایش صفحه اول مقاله
Feature screening in large scale cluster analysis
چکیده انگلیسی

We propose a novel methodology for feature screening in the clustering of massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex clustering criterion, we propose a highly scalable screening procedure that efficiently discards non-informative features by first computing a clustering score corresponding to the clustering tree constructed for each feature, and then thresholding the resulting values. We provide theoretical support for our approach by establishing uniform non-asymptotic bounds on the clustering scores of the “noise” features. These bounds imply perfect screening of non-informative features with high probability and are derived via careful analysis of the empirical processes corresponding to the clustering trees that are constructed for each of the features by the associated clustering procedure. Through extensive simulation experiments, we compare the performance of our proposed method with other screening approaches popularly used in cluster analysis and obtain encouraging results. We demonstrate empirically that our method is applicable to cluster analysis of big datasets arising in single-cell gene expression studies.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Multivariate Analysis - Volume 161, September 2017, Pages 191-212
نویسندگان
, , ,