Storage tier-aware replicative data reorganization with prioritization for efficient workload processing

Article ID	Journal	Published Year	Pages	File Type
6873385	Future Generation Computer Systems	2018	24 Pages	PDF

Abstract

The importance of data collection, processing, and analysis is rapidly growing. Big Data technologies are in high demand in many fields, including bio-informatics, hydrometeorology, and high energy physics. One of the most popular computational paradigms used in large data processing frameworks is the MapReduce programming model. Today, majority of integrated optimization mechanisms that quickly produce simple solutions typically consider only load balancing, which is not sufficient for advanced computations. Thus, more efficient and complex approaches are required. In this paper, we suggest an improved algorithm based on categories for reorganizing data in MapReduce frameworks and using replication as well as network transfer. Moreover, we introduce an algorithm customization for urgent computations which require specific approaches in terms of execution time and reliability. We also consider modern data storage aspects, like the ability to work with data on different “layers” (HDD, SSD, and RAM), which can greatly improve the overall performance of our solution.

Keywords

Genetic algorithm Prioritization Metaheuristic