دانلود رایگان مقاله: مدل سازی موضوع های احتمالی در تنظیمات چند زبانه: یک مرور کلی از روش و کاربردهای آن

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515381	867002	2015	37 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

ترجمه فارسی عنوان

مدل سازی موضوع های احتمالی در تنظیمات چند زبانه: یک مرور کلی از روش و کاربردهای آن

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

مدلهای احتمالات چند زبانه، معادله متنی متقابل، انتقال دانش متقابل زبانی، بازیابی اطلاعات متقابل زبانی، نمایش داده ها مستقل از زبان، داده های غیر موازی

Cross-Lingual Information Retrieval - بازیابی اطلاعات متقابل زبانی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش مقاله

مدل سازی موضوع های احتمالی در تنظیمات چند زبانه: یک مرور کلی از روش و کاربردهای آن

چکیده انگلیسی

• A systematic overview of multilingual probabilistic topic modeling (MuPTM).
• A tutorial on methodology, modeling, training, output, inference and evaluation of MuPTM.
• Language-independent and language-pair independent data representations.
• A model-independent framework and applications in various cross-lingual tasks.
• A complete MuPTM-based framework for cross-lingual semantic similarity.

Probabilistic topic models are unsupervised generative models which model document content as a two-step generation process, that is, documents are observed as mixtures of latent concepts or topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingual to multilingual settings. Novel topic models have been designed to work with parallel and comparable texts. We define multilingual probabilistic topic modeling (MuPTM) and present the first full overview of the current research, methodology, advantages and limitations in MuPTM. As a representative example, we choose a natural extension of the omnipresent LDA model to multilingual settings called bilingual LDA (BiLDA). We provide a thorough overview of this representative multilingual model from its high-level modeling assumptions down to its mathematical foundations. We demonstrate how to use the data representation by means of output sets of (i) per-topic word distributions and (ii) per-document topic distributions coming from a multilingual probabilistic topic model in various real-life cross-lingual tasks involving different languages, without any external language pair dependent translation resource: (1) cross-lingual event-centered news clustering, (2) cross-lingual document classification, (3) cross-lingual semantic similarity, and (4) cross-lingual information retrieval. We also briefly review several other applications present in the relevant literature, and introduce and illustrate two related modeling concepts: topic smoothing and topic pruning. In summary, this article encompasses the current research in multilingual probabilistic topic modeling. By presenting a series of potential applications, we reveal the importance of the language-independent and language pair independent data representations by means of MuPTM. We provide clear directions for future research in the field by providing a systematic overview of how to link and transfer aspect knowledge across corpora written in different languages via the shared space of latent cross-lingual topics, that is, how to effectively employ learned per-topic word distributions and per-document topic distributions of any multilingual probabilistic topic model in various cross-lingual applications.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 51, Issue 1, January 2015, Pages 111–147

نویسندگان

Ivan Vulić, Wim De Smet, Jie Tang, Marie-Francine Moens,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : مدل سازی موضوع های احتمالی در تنظیمات چند زبانه: یک مرور کلی از روش و کاربردهای آن

دسترسی سریع

ارتباط

English Website