Article ID Journal Published Year Pages File Type
563306 Computer Speech & Language 2008 18 Pages PDF
Abstract

Active learning (AL) is a framework that attempts to reduce the cost of annotating training material for statistical learning methods. While a lot of papers have been presented on applying AL to natural language processing tasks reporting impressive savings, little work has been done on defining a stopping criterion. In this work, we present a stopping criterion for active learning based on the way instances are selected during uncertainty-based sampling and verify its applicability in a variety of settings. The statistical learning models used in our study are support vector machines (SVMs), maximum entropy models and Bayesian logistic regression and the tasks performed are text classification, named entity recognition and shallow parsing. In addition, we present a method for multiclass mutually exclusive SVM active learning.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
,