کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6951555 | 1451687 | 2016 | 19 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Topic modeling of Chinese language beyond a bag-of-words
ترجمه فارسی عنوان
مدل سازی موضوعی زبان چینی فراتر از یک کلمه کلیدی
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
مدلهای موضوعی، مدلسازی زبان چینی، طبقه بندی متن، مدل زبان، شخصیت؟ مدل کلمه موضوع، تخصیص صندوق قرض الحسنه،
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
پردازش سیگنال
چکیده انگلیسی
The topic model is one of best known hierarchical Bayesian models for language modeling and document analysis. It has achieved a great success in text classification, in which a text is represented as a big of its words, disregarding grammar and even word order, that is referred to as the bag-of-words assumption. In this paper, we investigate topic modeling of the Chinese language, which has different morphology from alphabetical western languages like English. The Chinese characters, but not the Chinese words, are the basic structural units in Chinese. In previous empirical studies, it shows that the character-based topic model performs better than the word-based topic model. In this research, we propose the character-word topic model (CWTM) to consider the character-word relation in topic modeling. Two types of experiments are designed to test the performance of the new proposed model: topic extraction and text classification. By empirical studies, we demonstrate the superiority of the new proposed model comparing to both word and character based topic models.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 40, November 2016, Pages 60-78
Journal: Computer Speech & Language - Volume 40, November 2016, Pages 60-78
نویسندگان
Zengchang Qin, Yonghui Cong, Tao Wan,