کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515466 867023 2015 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Analysis of named entity recognition and linking for tweets
ترجمه فارسی عنوان
تجزیه و تحلیل شناسایی و دسته بندی نام برای توییت
کلمات کلیدی
استخراج اطلاعات، شناسایی نام ابهام در ذات، میکروبلاگها، توییتر
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We analyse the named entity recognition and disambiguation performance on tweets.
• Multiple state-of-the-art systems are included.
• Commercial and academic systems suffer the same range of problems.
• Lack of context is a major problem, demanding new, custom NER & NEL approaches.
• A named entity linking corpus is released with the paper.

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 51, Issue 2, March 2015, Pages 32–49
نویسندگان
, , , , , , , ,