کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4960866 | 1446504 | 2017 | 9 صفحه PDF | دانلود رایگان |

Microblog such as Twitter and Sina Weibo provides a convenient and instant platform which makes information easy to share and acquire. However, Microblog's short, noisy, real-time features make Chinese Microblog entity linking task a new challenge. In this paper, we investigate many linking methods and introduce the implementation of our work on Chinese microblog entity linking task. By means of crawling Baidu encyclopaedia web page, we generate polysemous, synonymous and index collections in MongoDB to manage the entities. We use a Chinese NLP tools named HanLP1 to perform noun words extracting, and then generate candidate set with these collections and word similarity. For disambiguation part, we take Word2vec2 whose model is trained by THUC news3 to determine the textual relevance. Our work performs pretty well on the Sina Weibo data set.
Journal: Procedia Computer Science - Volume 111, 2017, Pages 37-45