کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6874923 1441463 2018 43 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
ترجمه فارسی عنوان
ایجاد حس کارایی در چارچوب محاسبات در حافظه برای تجزیه و تحلیل داده های علمی: مطالعه موردی سیستم جرقه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
Over the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data analytics performing large-scale matrix operations. We compare its performance to SciDB, a disk-based platform for array data analysis. A benchmark, ArrayBench, is developed to evaluate the performance of four analytics processing gene expression matrices using basic data operators of Spark and SciDB. It is applied to data from a real biological workflow whose data inputs are in matrix form. Herein, we report the findings, which shed light on the improvement of Spark and SciDB and the future development of data-intensive scientific data analytics using the in-memory computing frameworks.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 120, October 2018, Pages 369-382
نویسندگان
, , , ,