کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
403430 677227 2016 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Reducing explicit semantic representation vectors using Latent Dirichlet Allocation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Reducing explicit semantic representation vectors using Latent Dirichlet Allocation
چکیده انگلیسی

Explicit Semantic Analysis (ESA) is a knowledge-based method which builds the semantic representation of the words depending on the textual description of the concepts in the certain knowledge source. Due to its simplicity and success, ESA has received wide attention from researchers in the computational linguistics and information retrieval. However, the representation vectors formed by ESA method are generally very excessive, high dimensional, and may contain many redundant concepts. In this paper, we introduce a reduced semantic representation method that constructs the semantic interpretation of the words as the vectors over the latent topics from the original ESA representation vectors. For modeling the latent topics, the Latent Dirichlet Allocation (LDA) is adapted to the ESA vectors for extracting the topics as the probability distributions over the concepts rather than the words in the traditional model. The proposed method is applied to the wide knowledge sources used in the computational semantic analysis: WordNet and Wikipedia. For evaluation, we use the proposed method in two natural language processing tasks: measuring the semantic relatedness between words/texts and text clustering. The experimental results indicate that the proposed method overcomes the limitations of the representation of the ESA method.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 100, 15 May 2016, Pages 145–159
نویسندگان
, , ,