The value of parsing as feature generation for gene mention recognition

Article ID	Journal	Published Year	Pages	File Type
517680	Journal of Biomedical Informatics	2009	10 Pages	PDF

Abstract

We measured the extent to which information surrounding a base noun phrase reflects the presence of a gene name, and evaluated seven different parsers in their ability to provide information for that purpose. Using the GENETAG corpus as a gold standard, we performed machine learning to recognize from its context when a base noun phrase contained a gene name. Starting with the best lexical features, we assessed the gain of adding dependency or dependency-like relations from a full sentence parse. Features derived from parsers improved performance in this partial gene mention recognition task by a small but statistically significant amount. There were virtually no differences between parsers in these experiments.

Keywords

Named entity recognition Support vector machines Natural Language Processing Machine learning