کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942675 1437414 2017 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Mining coherent topics in documents using word embeddings and large-scale text data
ترجمه فارسی عنوان
معادن منحصر به فرد معادن در اسناد با استفاده از تعبیر کلمه و داده های متنوع در مقیاس بزرگ
کلمات کلیدی
مدل موضوع، دانش دامنه، تعبیه کلمه داده های متنی گسترده،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Probabilistic topic models have been extensively used to extract low-dimension aspects from document collections. However, such models without any human knowledge often generate topics that are not interpretable. Recently, a number of knowledge-based topic models have been proposed, which enable users to input prior domain knowledge to produce more meaningful and coherent topics. Word embeddings, on the other hand, can automatically capture both semantic and syntactic information of words from a large amount of documents, and can be used to measure word similarities. In this paper, we incorporate word embeddings obtained from a large number of domains into topic modeling. By combining Latent Dirichlet Allocation, a widely used topic model with Skip-Gram, a well-known framework for learning word vectors, we improve the semantic coherence significantly. Our evaluation results using product review documents from 100 domains will demonstrate the effectiveness of our method.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 64, September 2017, Pages 432-439
نویسندگان
, , , , , ,