کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4960866 1446504 2017 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An approach on Chinese microblog entity linking combining baidu encyclopaedia and word2vec
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
An approach on Chinese microblog entity linking combining baidu encyclopaedia and word2vec
چکیده انگلیسی

Microblog such as Twitter and Sina Weibo provides a convenient and instant platform which makes information easy to share and acquire. However, Microblog's short, noisy, real-time features make Chinese Microblog entity linking task a new challenge. In this paper, we investigate many linking methods and introduce the implementation of our work on Chinese microblog entity linking task. By means of crawling Baidu encyclopaedia web page, we generate polysemous, synonymous and index collections in MongoDB to manage the entities. We use a Chinese NLP tools named HanLP1 to perform noun words extracting, and then generate candidate set with these collections and word similarity. For disambiguation part, we take Word2vec2 whose model is trained by THUC news3 to determine the textual relevance. Our work performs pretty well on the Sina Weibo data set.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 111, 2017, Pages 37-45
نویسندگان
, ,