کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4946133 1439269 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields
ترجمه فارسی عنوان
ترکیبی از یادگیری فعال و خودآموز برای شناسایی موجودیت نام در توییتر با استفاده از زمینه های تصادفی شرطی
کلمات کلیدی
شناسایی نام یادگیری فعال، خودآموزی، صدای جیر جیر
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 132, 15 September 2017, Pages 179-187
نویسندگان
, , , , ,