A document clustering algorithm for discovering and describing topics

Article ID	Journal	Published Year	Pages	File Type
536585	Pattern Recognition Letters	2010	9 Pages	PDF

Abstract

In this paper, we introduce a new clustering algorithm for discovering and describing the topics comprised in a text collection. Our proposal relies on both the most probable term pairs generated from the collection and the estimation of the topic homogeneity associated to these pairs. Topics and their descriptions are generated from those term pairs whose support sets are homogeneous enough for representing collection topics. Experimental results obtained over three benchmark text collections demonstrate the effectiveness and utility of this new approach.

Keywords

Document clustering Topic discovery

Related Topics

Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition

Preview

A document clustering algorithm for discovering and describing topics

Authors

Henry Anaya-Sánchez, Aurora Pons-Porrata, Rafael Berlanga-Llavori,