کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942972 1437616 2017 28 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Content Tree Word Embedding for document representation
ترجمه فارسی عنوان
محتوا درخت ورد مونتاژ برای نمایندگی سند
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Only humans can understand and comprehend the actual meaning that underlies natural written language, whereas machines can form semantic relationships only after humans have provided the parameters that are necessary to model the meaning. To enable computer models to access the underlying meaning in written language, accurate and sufficient document representation is crucial. Recently, word embedding approaches have drawn much attention in text mining research. One of the main benefits of such approaches is the use of global corpuses with the generation of pre-trained word vectors. Although very effective, these approaches have their disadvantages. Relying only on pre-trained word vectors may neglect the local context and increase word ambiguity. In this study, a new approach, Content Tree Word Embedding (CTWE), is introduced to mitigate the risk of word ambiguity and inject a local context into globally pre-trained word vectors. CTWE is basically a framework for document representation while using word embedding feature learning. The CTWE structure is locally learned from training data and ultimately represents the local context. While CTWE is constructed, each word vector is updated based on its location in the content tree. For the task of classification, the results show an improvement in F-score and accuracy measures when using two deep learning-based word embedding approaches, namely GloVe and Word2Vec.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 90, 30 December 2017, Pages 241-249
نویسندگان
, ,