An empirical comparison of latent sematic models for applications in industry

Article ID	Journal	Published Year	Pages	File Type
405974	Neurocomputing	2016	10 Pages	PDF

Abstract

•Results of metrics applied to establish topics quality are not correlated with reported results.•Previous work used different latent semantic algorithms and evaluated them in an aggregated level not disaggregated.•We proved that applied metrics makes easier the analysis to discriminate topics usefulness.•We discover that Bayesian methods were more interpretable than other methods to develop business applications.•Considering the power-law property in a latent semantic model increases the interpretability of topics extracted.

In recent years, topic models have been gaining popularity to perform classification of text from several web sources (from social networks to digital media). However, after working for many years in the web text mining area we have notice that assessing the quality of topics discovered is still an open problem, quite hard to solve. In this paper, we evaluated four latent semantic models using two metrics: coherence and interpretability which are the most used. We show how these pure mathematical metrics fall short to asses topics quality. Experiments were performed over a dataset of 21,863 text reclamation.

Keywords

Quality Measures Text mining