Asynchronous distributed estimation of topic models for document analysis

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
1151241	958204	2011	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Distributed learning - آموزش توزیع شده Parallelization - تقسیم بندی Topic model - مدل موضوع Gibbs sampling - نمونه گیبس

موضوعات مرتبط

مهندسی و علوم پایه ریاضیات آمار و احتمال

پیش نمایش صفحه اول مقاله

Asynchronous distributed estimation of topic models for document analysis

چکیده انگلیسی

Given the prevalence of large data sets and the availability of inexpensive parallel computing hardware, there is significant motivation to explore distributed implementations of statistical learning algorithms. In this paper, we present a distributed learning framework for Latent Dirichlet Allocation (LDA), a well-known Bayesian latent variable model for sparse matrices of count data. In the proposed approach, data are distributed across PP processors, and processors independently perform inference on their local data and communicate their sufficient statistics in a local asynchronous manner with other processors. We apply two different approximate inference techniques for LDA, collapsed Gibbs sampling and collapsed variational inference, within a distributed framework. The results show significant improvements in computation time and memory when running the algorithms on very large text corpora using parallel hardware. Despite the approximate nature of the proposed approach, simulations suggest that asynchronous distributed algorithms are able to learn models that are nearly as accurate as those learned by the standard non-distributed approaches. We also find that our distributed algorithms converge rapidly to good solutions.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Statistical Methodology - Volume 8, Issue 1, January 2011, Pages 3–17

نویسندگان

Arthur U. Asuncion, Padhraic Smyth, Max Welling,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Asynchronous distributed estimation of topic models for document analysis

دسترسی سریع

ارتباط

English Website