کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
405974 678051 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An empirical comparison of latent sematic models for applications in industry
ترجمه فارسی عنوان
یک مقایسه تجربی از مدل های سمتی پنهان برای برنامه های کاربردی در صنعت
کلمات کلیدی
معانی انسانی، استخراج متن، اندازه گیری کیفیت، ارزیابی ارزیابی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• Results of metrics applied to establish topics quality are not correlated with reported results.
• Previous work used different latent semantic algorithms and evaluated them in an aggregated level not disaggregated.
• We proved that applied metrics makes easier the analysis to discriminate topics usefulness.
• We discover that Bayesian methods were more interpretable than other methods to develop business applications.
• Considering the power-law property in a latent semantic model increases the interpretability of topics extracted.

In recent years, topic models have been gaining popularity to perform classification of text from several web sources (from social networks to digital media). However, after working for many years in the web text mining area we have notice that assessing the quality of topics discovered is still an open problem, quite hard to solve. In this paper, we evaluated four latent semantic models using two metrics: coherence and interpretability which are the most used. We show how these pure mathematical metrics fall short to asses topics quality. Experiments were performed over a dataset of 21,863 text reclamation.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 179, 29 February 2016, Pages 176–185
نویسندگان
, ,