Article ID Journal Published Year Pages File Type
493261 Procedia Technology 2012 8 Pages PDF
Abstract

Named entities are the most informative element of a textual document and identification of the names is very much important for extracting further information from text. We have developed a conditional random field based system to identify the named entities from homeopathic diagnosis discussion forum text. We have manually annotated a training corpus for the task. As manual creation of a sufficiently large annotated corpus is costly and time consuming, we use an active learning based semi-supervised framework to increase the efficiency of the system with the help of un-annotated data. Our system achieves the highest f-value of 84.35.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)