کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515564 867045 2013 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Two-stage NER for tweets with clustering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Two-stage NER for tweets with clustering
چکیده انگلیسی

One main challenge of Named Entities Recognition (NER) for tweets is the insufficient information in a single tweet, owing to the noisy and short nature of tweets. We propose a novel system to tackle this challenge, which leverages redundancy in tweets by conducting two-stage NER for multiple similar tweets. Particularly, it first pre-labels each tweet using a sequential labeler based on the linear Conditional Random Fields (CRFs) model. Then it clusters tweets to put tweets with similar content into the same group. Finally, for each cluster it refines the labels of each tweet using an enhanced CRF model that incorporates the cluster level information, i.e., the labels of the current word and its neighboring words across all tweets in the cluster. We evaluate our method on a manually annotated dataset, and show that our method boosts the F1 of the baseline without collectively labeling from 75.4% to 82.5%.


► We study the task of named entity recognition for tweets, which is challenging owing to the dearth of information in a single tweet.
► We propose a novel system that conducts two-stage labeling to exploit the redundancy in similar tweets.
► We evaluate our method on a human annotated dataset, and show that our method outperforms the strong baseline without collectively labeling.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 49, Issue 1, January 2013, Pages 264–273
نویسندگان
, ,