کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
11004915 | 1480061 | 2018 | 13 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Application of Rényi and Tsallis entropies to topic modeling optimization
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
ریاضیات
فیزیک ریاضی
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
This study proposes to minimize Rényi and Tsallis entropies for finding the optimal number of topics T in topic modeling (TM). A promising tool to obtain knowledge about large text collections, TM is a method whose properties are underresearched; in particular, parameter optimization in such models has been hindered by the use of monotonous quality functions with no clear thresholds. In this research, topic models obtained from large text collections are viewed as nonequilibrium complex systems where the number of topics is regarded as an equivalent of temperature. This allows calculating free energy of such systems-a value through which both Rényi and Tsallis entropies are easily expressed. Numerical experiments with four TM algorithms and two text collections show that both entropies as functions of the number of topics yield clear minima in the middle area of the range of T. On the marked-up dataset the minima of three algorithms correspond to the value of T detected by humans. It is concluded that Tsallis and especially Rényi entropy can be used for T optimization instead of Shannon entropy that decreases even when T becomes obviously excessive. Additionally, some algorithms are found to be better suited for revealing local entropy minima. Finally, we test whether the overall content of all topics taken together is resistant to the change of T and find out that this dependence has a quasi-periodic structure which demands further research.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Physica A: Statistical Mechanics and its Applications - Volume 512, 15 December 2018, Pages 1192-1204
Journal: Physica A: Statistical Mechanics and its Applications - Volume 512, 15 December 2018, Pages 1192-1204
نویسندگان
Sergei Koltcov,