Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
417037 | Computational Statistics & Data Analysis | 2010 | 18 Pages |
Statistical clustering criteria with free scale parameters and unknown cluster sizes are inclined to create small, spurious clusters. To mitigate this tendency a statistical model for cardinality-constrained clustering of data with gross outliers is established, its maximum likelihood and maximum a posteriori clustering criteria are derived, and their consistency and robustness are analyzed. The criteria lead to constrained optimization problems that can be solved by using iterative, alternating trimming algorithms of kk-means type. Each step in the algorithms requires the solution of a λλ-assignment problem known from combinatorial optimization. The method allows one to estimate the numbers of clusters and outliers. It is illustrated with a synthetic data set and a real one.