کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425632 685799 2015 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
High frequency batch-oriented computations over large sliding time windows
ترجمه فارسی عنوان
محاسبات باند فرکانس بالا بر روی پنجره های بزرگ کشویی
کلمات کلیدی
پردازش رویداد، پردازش دسته ای، محاسبات بر اساس زمان پنجره، تجزیه و تحلیل داده ها، اطلاعات بزرگ
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• We present a batch-oriented approach for time window computations.
• We analyze the impact of input data organization on computation performances.
• We define a set of strategies for smartly organizing input data.
• We implement these strategies on Hadoop/HDFS framework.
• We present experimental evaluations about the effectiveness of this solution.

Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low.In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volumes 43–44, February 2015, Pages 1–11
نویسندگان
, , ,