Denormalize and Delimit: How Not to Make Data Extraction for Analysis More Complex Than Necessary

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
484163	703253	2016	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Data extraction - استخراج داده ها Data transformation - تبدیل داده ها Health services research - تحقیقات خدمات بهداشتی Data formats - فرمت های داده relational databases - پایگاه داده های ارتباطی Electronic health records - پرونده الکترونیک بیمار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Denormalize and Delimit: How Not to Make Data Extraction for Analysis More Complex Than Necessary

چکیده انگلیسی

There are many legitimate reasons why standards for formatting of biomedical research data are lengthy and complex (Souza, Kush, & Evans, 2007). However, the common scenario of a biostatistician simply needing to import a given dataset into their statistical software is at best under-served by these standards. Statisticians are forced to act as amateur database administrators to pivot and join their data into a usable form before they can even begin the work that they specialize in doing. Or worse, they find their choice of statistical tools dictated not by their own experience and skills, but by remote standards bodies or inertial administrative choices. This may limit academic freedom. If the formats in question require the use of one proprietary software package, it also raises concerns about vendor lock-in (DeLano, 2005) and stewardship of public resources.The logistics and transparency of data sharing can be made more tractable by an appreciation of the differences between structural, semantic, and syntactic levels of data interoperability. The semantic level is legitimately a complex problem. Here we make the case that, for the limited purpose of statistical analysis, a simplifying assumption can be made about structural level: the needs of a large number of statistical models can often be met with a modified variant of the first normal form or 1NF (Codd, 1979). Once data is merged into one such table, the syntactic level becomes a solved problem, with many text based formats available and robustly supported by virtually all statistical software without the need for any custom or third-party client-side add-ons. We implemented our denormalization approach in DataFinisher, an open source server-side add-on for i2b2 (Murphy et al., 2009), which we use at our site to enable self-service pulls of de-identified data by researchers.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 80, 2016, Pages 1033–1041

نویسندگان

Alex F. Bokov, Laura Manuel, Catherine Cheng, Angela Bos, Alfredo Tirado-Ramos,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Denormalize and Delimit: How Not to Make Data Extraction for Analysis More Complex Than Necessary

دسترسی سریع

ارتباط

English Website