Article ID Journal Published Year Pages File Type
4960866 Procedia Computer Science 2017 9 Pages PDF
Abstract

Microblog such as Twitter and Sina Weibo provides a convenient and instant platform which makes information easy to share and acquire. However, Microblog's short, noisy, real-time features make Chinese Microblog entity linking task a new challenge. In this paper, we investigate many linking methods and introduce the implementation of our work on Chinese microblog entity linking task. By means of crawling Baidu encyclopaedia web page, we generate polysemous, synonymous and index collections in MongoDB to manage the entities. We use a Chinese NLP tools named HanLP1 to perform noun words extracting, and then generate candidate set with these collections and word similarity. For disambiguation part, we take Word2vec2 whose model is trained by THUC news3 to determine the textual relevance. Our work performs pretty well on the Sina Weibo data set.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,