Comparative study of word embedding methods in topic segmentation

Article ID	Journal	Published Year	Pages	File Type
4960612	Procedia Computer Science	2017	10 Pages	PDF

Abstract

The vector representations of words are very useful in different natural language processing tasks in order to capture the semantic meaning of words. In this context, the three known methods are: LSA, Word2Vec and GloVe. In this paper, these methods will be investigated in the field of topic segmentation for both languages Arabic and English. Moreover, Word2Vec is studied in depth by using different models and approximation algorithms. As results, we found out that LSA, Word2Vec and GloVe depend on the used language. However, Word2Vec presents the best word vector representation yet it depends on the choice of model.

Keywords

LSA word2vec word embedding Topic segmentation Glove