Article ID Journal Published Year Pages File Type
6869255 Computational Statistics & Data Analysis 2016 18 Pages PDF
Abstract
A novel statistical procedure for clustering individuals characterized by sparse-specific profiles is introduced in the context of data summarized in sparse contingency tables. The proposed procedure relies on a single-linkage clustering based on a new dissimilarity measure designed to give equal influence to sparsity and specificity of profiles. Theoretical properties of the new dissimilarity are derived by characterizing single-linkage clustering using Minimum Spanning Trees. Such characterization allows the description of situations for which the proposed dissimilarity outperforms competing dissimilarities. Simulation examples are performed to demonstrate the strength of the new dissimilarity compared to 11 other methods. The analysis of a genomic dataset dedicated to the study of molecular signatures of selection is used to illustrate the efficiency of the proposed method in a real situation.
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,