Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
437425 | Theoretical Computer Science | 2016 | 12 Pages |
Abstract
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the Lloyd's algorithm for the k-means problem. It was conjectured by Brunsch and Röglin [9] that k-means++ behaves well for datasets with small dimension. More specifically, they conjectured that the k -means++ seeding algorithm gives O(logd)O(logd) approximation with high probability for any d-dimensional dataset. In this work, we refute this conjecture by giving two dimensional datasets on which the k -means++ seeding algorithm achieves an O(logk)O(logk) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [12] and by Brunsch and Röglin [9].
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Anup Bhattacharya, Ragesh Jaiswal, Nir Ailon,