دانلود رایگان مقاله: شناسایی روایت های بالینی از طریق نوشتن مقیاس های پیچیدگی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
516144	1449118	2014	18 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

De-identification of clinical narratives through writing complexity measures

ترجمه فارسی عنوان

شناسایی روایت های بالینی از طریق نوشتن مقیاس های پیچیدگی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

سوابق پزشکی الکترونیکی، حریم خصوصی، پردازش زبان طبیعی

Privacy - حریم شخصی Electronic medical records - مدارک پزشکی الکترونیکی Natural Language Processing - پردازش زبان‌های طبیعی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش مقاله

شناسایی روایت های بالینی از طریق نوشتن مقیاس های پیچیدگی

چکیده انگلیسی

• Medical records can be clustered by their stylometric features.
• Stylometric clusters can achieve high performing de-identification models.
• De-identification models based on stylometric clusters outperform random models.
• Stylometric-feature-based models perform at least as well as models by document type.

PurposeElectronic health records contain a substantial quantity of clinical narrative, which is increasingly reused for research purposes. To share data on a large scale and respect privacy, it is critical to remove patient identifiers. De-identification tools based on machine learning have been proposed; however, model training is usually based on either a random group of documents or a pre-existing document type designation (e.g., discharge summary). This work investigates if inherent features, such as the writing complexity, can identify document subsets to enhance de-identification performance.MethodsWe applied an unsupervised clustering method to group two corpora based on writing complexity measures: a collection of over 4500 documents of varying document types (e.g., discharge summaries, history and physical reports, and radiology reports) from Vanderbilt University Medical Center (VUMC) and the publicly available i2b2 corpus of 889 discharge summaries. We compare the performance (via recall, precision, and F-measure) of de-identification models trained on such clusters with models trained on documents grouped randomly or VUMC document type.ResultsFor the Vanderbilt dataset, it was observed that training and testing de-identification models on the same stylometric cluster (with the average F-measure of 0.917) tended to outperform models based on clusters of random documents (with an average F-measure of 0.881). It was further observed that increasing the size of a training subset sampled from a specific cluster could yield improved results (e.g., for subsets from a certain stylometric cluster, the F-measure raised from 0.743 to 0.841 when training size increased from 10 to 50 documents, and the F-measure reached 0.901 when the size of the training subset reached 200 documents). For the i2b2 dataset, training and testing on the same clusters based on complexity measures (average F-score 0.966) did not significantly surpass randomly selected clusters (average F-score 0.965).ConclusionsOur findings illustrate that, in environments consisting of a variety of clinical documentation, de-identification models trained on writing complexity measures are better than models trained on random groups and, in many instances, document types.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: International Journal of Medical Informatics - Volume 83, Issue 10, October 2014, Pages 750–767

نویسندگان

Muqun Li, David Carrell, John Aberdeen, Lynette Hirschman, Bradley A. Malin,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : شناسایی روایت های بالینی از طریق نوشتن مقیاس های پیچیدگی

دسترسی سریع

ارتباط

English Website