Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
485809 | Procedia Computer Science | 2015 | 8 Pages |
Abstract
A text clustering algorithm is proposed to overcome the drawback of division based clustering method on sensitivity of estimated class number. Complex features including synonym and co-occurring words are extracted to make a feature space containing more semantic information. Then the divide and merge strategy helps the iteration converge to a reasonable cluster number. Experimental results showed that the dynamically updated center number prevent the deterioration of clustering result when k deviates from the real class numbers. When k is too small or large, the difference of clustering results between FC-DM and k-means is more obvious and FC-DM also outperformed other benchmark algorithms.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)