کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
517563 | 867464 | 2007 | 11 صفحه PDF | دانلود رایگان |

ObjectiveTo develop an automated, high-throughput, and reproducible method for reclassifying and validating ontological concepts for natural language processing applications.DesignWe developed a distributional similarity approach to classify the Unified Medical Language System (UMLS) concepts. Classification models were built for seven broad biomedically relevant semantic classes created by grouping subsets of the UMLS semantic types. We used contextual features based on syntactic properties obtained from two different large corpora and used α-skew divergence as the similarity measure.MeasurementsThe testing sets were automatically generated based on the changes by the National Library of Medicine to the semantic classification of concepts from the UMLS 2005AA to the 2006AA release. Error rates were calculated and a misclassification analysis was performed.ResultsThe estimated lowest error rates were 0.198 and 0.116 when considering the correct classification to be covered by our top prediction and top 2 predictions, respectively.ConclusionThe results demonstrated that the distributional similarity approach can recommend high level semantic classification suitable for use in natural language processing.
Journal: Journal of the American Medical Informatics Association - Volume 14, Issue 4, July–August 2007, Pages 467–477