کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
5129528 1489738 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Statistical properties of the single linkage hierarchical clustering estimator
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات ریاضیات کاربردی
پیش نمایش صفحه اول مقاله
Statistical properties of the single linkage hierarchical clustering estimator
چکیده انگلیسی


• We incorporate a statistical model for estimating hierarchical clustering.
• We focus on single linkage hierarchical clustering (SLHC) and study its geometry.
• SLHC is equivalent to the MPPLE under certain conditions on the distribution model for measurements.
• SLHC is shown to be a consistent estimator for multiple measurements.
• This paper indicates a full MLE of SLHC should outperform SLHC.

Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Statistical Planning and Inference - Volume 185, June 2017, Pages 15–28