RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558481	874939	2011	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

PLSA Information retrieval - بازیابی اطلاعات Expectation Maximization - به حداکثر رساندن انتظارات Document clustering - خوشه بندی مستند

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis

چکیده انگلیسی

A novel updating method for Probabilistic Latent Semantic Analysis (PLSA), called Recursive PLSA (RPLSA), is proposed. The updating of conditional probabilities is derived from first principles for both the asymmetric and the symmetric PLSA formulations. The performance of RPLSA for both formulations is compared to that of the PLSA folding-in, the PLSA rerun from the breakpoint, and well-known LSA updating methods, such as the singular value decomposition (SVD) folding-in and the SVD-updating. The experimental results demonstrate that the RPLSA outperforms the other updating methods under study with respect to the maximization of the average log-likelihood and the minimization of the average absolute error between the probabilities estimated by the updating methods and those derived by applying the non-adaptive PLSA from scratch. A comparison in terms of CPU run time is conducted as well. Finally, in document clustering using the Adjusted Rand index, it is demonstrated that the clusters generated by the RPLSA are: (a) similar to those generated by the PLSA applied from scratch; (b) closer to the ground truth than those created by the other PLSA or LSA updating methods.

Research highlights
► A novel updating method for PLSA (RPLSA), when new documents are added incrementally to an initial document collection, is proposed.
► Both the asymmetric and the symmetric PLSA formulations are examined.
► Two initialization schemes for the probability distribution of the added documents are tested.
► RPLSA outperforms the established PLSA and LSA updating methods in terms of accuracy and in document clustering performance.
► RPLSA is more time consuming than the PLSA foldin-in but less than the PLSA rerun from the breakpoint.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 25, Issue 4, October 2011, Pages 741–760

نویسندگان

N. Bassiou, C. Kotropoulos,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis

دسترسی سریع

ارتباط

English Website