کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
518834 867617 2007 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification
چکیده انگلیسی

Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biomedical Informatics - Volume 40, Issue 3, June 2007, Pages 316–324
نویسندگان
, , , ,