کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
407833 678175 2012 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Topic model validation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Topic model validation
چکیده انگلیسی

In this paper the problem of performing external validation of the semantic coherence of topic models is considered. The Fowlkes–Mallows index, a known clustering validation metric, is generalized for the case of overlapping partitions and multi-labeled collections, thus making it suitable for validating topic modeling algorithms. In addition, we propose new probabilistic metrics inspired by the concepts of recall and precision. The proposed metrics also have clear probabilistic interpretations and can be applied to validate and compare other soft and overlapping clustering algorithms. The approach is exemplified by using the Reuters-21578 multi-labeled collection to validate LDA models, then using Monte Carlo simulations to show the convergence to the correct results. Additional statistical evidence is provided to better understand the relation of the metrics presented.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 76, Issue 1, 15 January 2012, Pages 125–133
نویسندگان
, , , ,