کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402453 676948 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Improving short text classification by learning vector representations of both words and hidden topics
ترجمه فارسی عنوان
بهبود طبقه بندی متن کوتاه با یادگیری تصاویر برداری از کلمات و موضوعات پنهان
کلمات کلیدی
متن کوتاه، مدل موضوع، داده های غنی سازی، بردارهای کلمه و موضوع
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• We exploit the knowledge from a topic-consistent corpus for topic modeling and use the topics to enrich the corpus and the short texts.
• We learn the vector representations of both words and topics interactively on the enriched corpus.
• We use the vectors of the words and topics to represent the features of short texts for training and classification.
• Our method performs better than many baselines.

This paper presents a general framework for short text classification by learning vector representations of both words and hidden topics together. We refer to a large-scale external data collection named ”corpus” which is topic consistent with short texts to be classified and then use the corpus to build topic model with Latent Dirichlet Allocation (LDA). For all the texts of the corpus and short texts, topics of words are viewed as new words and integrated into texts for data enriching. On the enriched corpus, we can learn vector representations of both words and topics. In this way, feature representations of short texts can be performed based on vectors of both words and topics for training and classification. On an open short text classification data set, learning vectors of both words and topics can significantly help reduce the classification error comparing with learning only word vectors. We also compared the proposed classification method with various baselines and experimental results justified the effectiveness of our word/topic vector representations.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 102, 15 June 2016, Pages 76–86
نویسندگان
, ,