کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
516261 1449131 2013 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs
چکیده انگلیسی


• The MIST de-identification framework can be used to build a de-identification system.
• A de-identification system can be built with less than one day's worth of effort.
• After just 33 min of annotation time, an F-score of 0.89 is achievable.
• History and physical notes are easier to de-identify than social work notes.

PurposeWe describe an experiment to build a de-identification system for clinical records using the open source MITRE Identification Scrubber Toolkit (MIST). We quantify the human annotation effort needed to produce a system that de-identifies at high accuracy.MethodsUsing two types of clinical records (history and physical notes, and social work notes), we iteratively built statistical de-identification models by annotating 10 notes, training a model, applying the model to another 10 notes, correcting the model's output, and training from the resulting larger set of annotated notes. This was repeated for 20 rounds of 10 notes each, and then an additional 6 rounds of 20 notes each, and a final round of 40 notes. At each stage, we measured precision, recall, and F-score, and compared these to the amount of annotation time needed to complete the round.ResultsAfter the initial 10-note round (33 min of annotation time) we achieved an F-score of 0.89. After just over 8 h of annotation time (round 21) we achieved an F-score of 0.95. Number of annotation actions needed, as well as time needed, decreased in later rounds as model performance improved. Accuracy on history and physical notes exceeded that of social work notes, suggesting that the wider variety and contexts for protected health information (PHI) in social work notes is more difficult to model.ConclusionsIt is possible, with modest effort, to build a functioning de-identification system de novo using the MIST framework. The resulting system achieved performance comparable to other high-performing de-identification systems.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: International Journal of Medical Informatics - Volume 82, Issue 9, September 2013, Pages 821–831
نویسندگان
, , , , , , ,