Article ID Journal Published Year Pages File Type
405974 Neurocomputing 2016 10 Pages PDF
Abstract

•Results of metrics applied to establish topics quality are not correlated with reported results.•Previous work used different latent semantic algorithms and evaluated them in an aggregated level not disaggregated.•We proved that applied metrics makes easier the analysis to discriminate topics usefulness.•We discover that Bayesian methods were more interpretable than other methods to develop business applications.•Considering the power-law property in a latent semantic model increases the interpretability of topics extracted.

In recent years, topic models have been gaining popularity to perform classification of text from several web sources (from social networks to digital media). However, after working for many years in the web text mining area we have notice that assessing the quality of topics discovered is still an open problem, quite hard to solve. In this paper, we evaluated four latent semantic models using two metrics: coherence and interpretability which are the most used. We show how these pure mathematical metrics fall short to asses topics quality. Experiments were performed over a dataset of 21,863 text reclamation.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,