Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4969605 | Pattern Recognition | 2017 | 15 Pages |
Abstract
With the exponential growth of text documents on the web, there is a genuine need for techniques that organize terms and documents, simultaneously, into meaningful clusters, thereby making large datasets easier to handle and interpret. Several block diagonal clustering algorithms have proven successful in identifying co-clusters of documents and words. However, despite their effectiveness, most of the existing methods do not provide a parameterizable model for tackling the problem of block diagonal identification. In this paper, we rely on mixture models, which offer strong theoretical foundations and considerable flexibility. More precisely, we propose a parsimonious latent block model based on the mixture of Poisson distributions and tailored for sparse high dimensional data such as document-term matrices. In order to efficiently estimate the model parameters, we derive two scalable co-clustering algorithms based on variational inference. Empirical results obtained on several real-world text datasets highlight the advantages of the proposed model and the corresponding co-clustering algorithms.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Melissa Ailem, François Role, Mohamed Nadif,