Article ID Journal Published Year Pages File Type
4969605 Pattern Recognition 2017 15 Pages PDF
Abstract
With the exponential growth of text documents on the web, there is a genuine need for techniques that organize terms and documents, simultaneously, into meaningful clusters, thereby making large datasets easier to handle and interpret. Several block diagonal clustering algorithms have proven successful in identifying co-clusters of documents and words. However, despite their effectiveness, most of the existing methods do not provide a parameterizable model for tackling the problem of block diagonal identification. In this paper, we rely on mixture models, which offer strong theoretical foundations and considerable flexibility. More precisely, we propose a parsimonious latent block model based on the mixture of Poisson distributions and tailored for sparse high dimensional data such as document-term matrices. In order to efficiently estimate the model parameters, we derive two scalable co-clustering algorithms based on variational inference. Empirical results obtained on several real-world text datasets highlight the advantages of the proposed model and the corresponding co-clustering algorithms.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,