Article ID Journal Published Year Pages File Type
6873385 Future Generation Computer Systems 2018 24 Pages PDF
Abstract
The importance of data collection, processing, and analysis is rapidly growing. Big Data technologies are in high demand in many fields, including bio-informatics, hydrometeorology, and high energy physics. One of the most popular computational paradigms used in large data processing frameworks is the MapReduce programming model. Today, majority of integrated optimization mechanisms that quickly produce simple solutions typically consider only load balancing, which is not sufficient for advanced computations. Thus, more efficient and complex approaches are required. In this paper, we suggest an improved algorithm based on categories for reorganizing data in MapReduce frameworks and using replication as well as network transfer. Moreover, we introduce an algorithm customization for urgent computations which require specific approaches in terms of execution time and reliability. We also consider modern data storage aspects, like the ability to work with data on different “layers” (HDD, SSD, and RAM), which can greatly improve the overall performance of our solution.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , , ,