کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
518125 867559 2007 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Rapidly Retargetable Approaches to De-identification in Medical Records
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Rapidly Retargetable Approaches to De-identification in Medical Records
چکیده انگلیسی

ObjectiveThis paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation.MethodOur approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe.ResultsThe “out of the box” Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.ConclusionsWe were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of the American Medical Informatics Association - Volume 14, Issue 5, September–October 2007, Pages 564–573
نویسندگان
, , , , , , , , ,