کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942055 1436982 2017 48 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Latent tree models for hierarchical topic detection
ترجمه فارسی عنوان
مدل های درخت درختی برای تشخیص موضوع سلسله مراتبی
کلمات کلیدی
مدل های گرافیکی احتمالی، تجزیه و تحلیل متن، تحلیل سلسله مراتبی درخت نهایی تشخیص موضوع سلسله مراتبی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Artificial Intelligence - Volume 250, September 2017, Pages 105-124
نویسندگان
, , , , , ,