کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382426 660761 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An analysis of the coherence of descriptors in topic modeling
ترجمه فارسی عنوان
تجزیه و تحلیل انسجام توصیفگرها در مدل سازی موضوع
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• We evaluate the coherence and generality of topic descriptors found by LDA and NMF.
• Six new and existing corpora were specifically compiled for this evaluation.
• A new coherence measure using word2vec-modeled term vector similarity is proposed.
• NMF regularly produces more coherent topics, where term weighting is influential.
• NMF may be more suitable for topic modeling of niche or non-mainstream corpora.

In recent years, topic modeling has become an established method in the analysis of text corpora, with probabilistic techniques such as latent Dirichlet allocation (LDA) commonly employed for this purpose. However, it might be argued that adequate attention is often not paid to the issue of topic coherence, the semantic interpretability of the top terms usually used to describe discovered topics. Nevertheless, a number of studies have proposed measures for analyzing such coherence, where these have been largely focused on topics found by LDA, with matrix decomposition techniques such as Non-negative Matrix Factorization (NMF) being somewhat overlooked in comparison. This motivates the current work, where we compare and analyze topics found by popular variants of both NMF and LDA in multiple corpora in terms of both their coherence and associated generality, using a combination of existing and new measures, including one based on distributional semantics. Two out of three coherence measures find NMF to regularly produce more coherent topics, with higher levels of generality and redundancy observed with the LDA topic descriptors. In all cases, we observe that the associated term weighting strategy plays a major role. The results observed with NMF suggest that this may be a more suitable topic modeling method when analyzing certain corpora, such as those associated with niche or non-mainstream domains.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 42, Issue 13, 1 August 2015, Pages 5645–5657
نویسندگان
, , , ,