Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6873385 | Future Generation Computer Systems | 2018 | 24 Pages |
Abstract
The importance of data collection, processing, and analysis is rapidly growing. Big Data technologies are in high demand in many fields, including bio-informatics, hydrometeorology, and high energy physics. One of the most popular computational paradigms used in large data processing frameworks is the MapReduce programming model. Today, majority of integrated optimization mechanisms that quickly produce simple solutions typically consider only load balancing, which is not sufficient for advanced computations. Thus, more efficient and complex approaches are required. In this paper, we suggest an improved algorithm based on categories for reorganizing data in MapReduce frameworks and using replication as well as network transfer. Moreover, we introduce an algorithm customization for urgent computations which require specific approaches in terms of execution time and reliability. We also consider modern data storage aspects, like the ability to work with data on different “layers” (HDD, SSD, and RAM), which can greatly improve the overall performance of our solution.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Anton Spivak, Andrew Razumovskiy, Denis Nasonov, Alexander Boukhanovsky, Anton Redice,