Article ID Journal Published Year Pages File Type
417037 Computational Statistics & Data Analysis 2010 18 Pages PDF
Abstract

Statistical clustering criteria with free scale parameters and unknown cluster sizes are inclined to create small, spurious clusters. To mitigate this tendency a statistical model for cardinality-constrained clustering of data with gross outliers is established, its maximum likelihood and maximum a posteriori   clustering criteria are derived, and their consistency and robustness are analyzed. The criteria lead to constrained optimization problems that can be solved by using iterative, alternating trimming algorithms of kk-means type. Each step in the algorithms requires the solution of a λλ-assignment problem known from combinatorial optimization. The method allows one to estimate the numbers of clusters and outliers. It is illustrated with a synthetic data set and a real one.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,