Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4942063 | Artificial Intelligence | 2017 | 15 Pages |
Abstract
Hierarchical clustering is a popular approach in a number of fields with many well known algorithms. However, all existing work to our knowledge implements a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as linear constraints. Our experimental work shows that even for small data sets finding the global optimum produces more accurate results. Formalizing hierarchical clustering as an ILP with constraints has several advantages beyond finding the global optima. Relaxing the dendrogram constraints such as transitivity can produce novel problem variations such as finding hierarchies with overlapping clusterings. It is also possible to add constraints to encode guidance such as must-link, cannot-link, must-link-before etc. Finally, though exact solvers exist for ILP we show that a simple randomized algorithm and a linear programming (LP) relaxation can be used to provide approximate solutions faster.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Sean Gilpin, Ian Davidson,