A computational framework for converting textual clinical diagnostic criteria into the quality data model

Article ID	Journal	Published Year	Pages	File Type
517022	Journal of Biomedical Informatics	2016	11 Pages	PDF

Abstract

•A computational framework for modeling textual diagnostic criteria is developed.•A mechanism for classifying individual diagnostic criterion is created.•A machine-learning algorithm for classifying criterion attributes is created.•A computational pipeline prototype is developed.•The tool performance is evaluated with satisfactory results.

BackgroundConstructing standard and computable clinical diagnostic criteria is an important but challenging research field in the clinical informatics community. The Quality Data Model (QDM) is emerging as a promising information model for standardizing clinical diagnostic criteria.ObjectiveTo develop and evaluate automated methods for converting textual clinical diagnostic criteria in a structured format using QDM.MethodsWe used a clinical Natural Language Processing (NLP) tool known as cTAKES to detect sentences and annotate events in diagnostic criteria. We developed a rule-based approach for assigning the QDM datatype(s) to an individual criterion, whereas we invoked a machine learning algorithm based on the Conditional Random Fields (CRFs) for annotating attributes belonging to each particular QDM datatype. We manually developed an annotated corpus as the gold standard and used standard measures (precision, recall and f-measure) for the performance evaluation.ResultsWe harvested 267 individual criteria with the datatypes of Symptom and Laboratory Test from 63 textual diagnostic criteria. We manually annotated attributes and values in 142 individual Laboratory Test criteria. The average performance of our rule-based approach was 0.84 of precision, 0.86 of recall, and 0.85 of f-measure; the performance of CRFs-based classification was 0.95 of precision, 0.88 of recall and 0.91 of f-measure. We also implemented a web-based tool that automatically translates textual Laboratory Test criteria into the QDM XML template format. The results indicated that our approaches leveraging cTAKES and CRFs are effective in facilitating diagnostic criteria annotation and classification.ConclusionOur NLP-based computational framework is a feasible and useful solution in developing diagnostic criteria representation and computerization.

Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (153 K)Download as PowerPoint slide

Keywords

Conditional random fields Diagnostic criteria Natural Language Processing