کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558481 874939 2011 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis
چکیده انگلیسی

A novel updating method for Probabilistic Latent Semantic Analysis (PLSA), called Recursive PLSA (RPLSA), is proposed. The updating of conditional probabilities is derived from first principles for both the asymmetric and the symmetric PLSA formulations. The performance of RPLSA for both formulations is compared to that of the PLSA folding-in, the PLSA rerun from the breakpoint, and well-known LSA updating methods, such as the singular value decomposition (SVD) folding-in and the SVD-updating. The experimental results demonstrate that the RPLSA outperforms the other updating methods under study with respect to the maximization of the average log-likelihood and the minimization of the average absolute error between the probabilities estimated by the updating methods and those derived by applying the non-adaptive PLSA from scratch. A comparison in terms of CPU run time is conducted as well. Finally, in document clustering using the Adjusted Rand index, it is demonstrated that the clusters generated by the RPLSA are: (a) similar to those generated by the PLSA applied from scratch; (b) closer to the ground truth than those created by the other PLSA or LSA updating methods.

Research highlights
► A novel updating method for PLSA (RPLSA), when new documents are added incrementally to an initial document collection, is proposed.
► Both the asymmetric and the symmetric PLSA formulations are examined.
► Two initialization schemes for the probability distribution of the added documents are tested.
► RPLSA outperforms the established PLSA and LSA updating methods in terms of accuracy and in document clustering performance.
► RPLSA is more time consuming than the PLSA foldin-in but less than the PLSA rerun from the breakpoint.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 25, Issue 4, October 2011, Pages 741–760
نویسندگان
, ,