Article ID Journal Published Year Pages File Type
516097 International Journal of Medical Informatics 2016 8 Pages PDF
Abstract

•The proliferation of electronic medical records (EMR) provides a rich source of clinical data.•Currently, no standardized approach has been proposed to assess data completeness.•The six-step data quality framework can guide standardized metadata assessment.•Applying the same assessment framework can lead to standardized reporting in research.

IntroductionThe proliferation and use of electronic medical records (EMR) in the clinical setting now provide a rich source of clinical data that can be leveraged to support research on patient outcomes, comparative effectiveness, and health systems research. Once the large volume and variety of data that robust clinical EMRs provide is aggregated, the suitability of the data for research purposes must be addressed. Therefore, the purpose of this paper is two-fold. First, we present a stepwise framework capable of guiding initial data quality assessment when matching multiple data sources regardless of context or application. Then, we demonstrate a use case of initial analysis of a longitudinal data repository of electronic health record data that illustrates the first four steps of the framework, and report results.MethodsA six-step data quality assessment framework is proposed and described that includes the following data quality assessment steps: (1) preliminary analysis, (2) documentation–longitudinal concordance, (3) breadth, (4) data element presence, (5) density, and (6) prediction. The six-step framework was applied to the Transport Data Mart—a data repository that contains over 28,000 records for patients that underwent interhospital transfer that includes EMRs from the sending hospitalization, transport, and receiving hospitalization.ResultsThere were a total of 9557 log entries of which 8139 were successfully matched to corresponding hospital encounters. 2832 were successfully mapped to both the sending and receiving hospital encounters (resulting in a 93% automatic matching rate), with 590 including air medical transport EMR data representing a complete case for testing. Results from Step 2 indicate that once records are identified and matched, there appears to be relatively limited drop-off of additional records when the criteria for matching increases, indicating the a proportion of records consistently contain nearly complete data. Measures of central tendency used in Step 3 and 4 exhibit a right skewness suggesting that a small proportion of records contain the highest number of repeated measures for the measured variables.ConclusionsThe proposed six-step data quality assessment framework is useful in establishing the metadata for a longitudinal data repository that can be replicated by other studies. There are practical issues that need to be addressed including the data quality assessments—with the most prescient being the need to establish data quality metrics for benchmarking acceptable levels of EMR data inclusiveness through testing and application.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,