کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10225734 1701206 2018 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study
ترجمه فارسی عنوان
محاسبه مجدد انتخابی و تکراری مجدد داده های تحلیل داده های بزرگ: بینش های یک مطالعه موردی ژنومیک
کلمات کلیدی
دوباره محاسبه، تخریب دانش، تجزیه و تحلیل داده های بزرگ، ژنومیکس،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
The value of knowledge assets generated by analytics processes using Data Science techniques tends to decay over time, as a consequence of changes in the elements the process depends on: external data sources, libraries, and system dependencies. For large-scale problems, refreshing those outcomes through greedy re-computation is both expensive and inefficient, as some changes have limited impact. In this paper we address the problem of refreshing past process outcomes selectively, that is, by trying to identify the subset of outcomes that will have been affected by a change, and by only re-executing fragments of the original process. We propose a technical approach to address the selective re-computation problem by combining multiple techniques, and present an extensive experimental study in Genomics, namely variant calling and their clinical interpretation, to show its effectiveness. In this case study, we are able to decrease the number of required re-computations on a cohort of individuals from 495 (blind) down to 71, and that we can reduce runtime by at least 60% relative to the naïve blind approach, and in some cases by 90%. Starting from this experience, we then propose a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective re-computations in reaction to a variety of changes in the data.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Big Data Research - Volume 13, September 2018, Pages 76-94
نویسندگان
, ,