کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425091 685682 2013 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Scalable parallel computing on clouds using Twister4Azure iterative MapReduce
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Scalable parallel computing on clouds using Twister4Azure iterative MapReduce
چکیده انگلیسی

Recent advances in data-intensive computing for science discovery are fueling a dramatic growth in the use of data-intensive iterative computations. The utility computing model introduced by cloud computing, combined with the rich set of cloud infrastructure and storage services, offers a very attractive environment in which scientists can perform data analytics. The challenges to large-scale distributed computations on cloud environments demand innovative computational frameworks that are specifically tailored for cloud characteristics to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. Twister4Azure extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a fault-tolerance execution of a wide array of data mining and data analysis applications on the Azure cloud. Twister4Azure utilizes the scalable, distributed and highly available Azure cloud services as the underlying building blocks, and employs a decentralized control architecture that avoids single point failures. Twister4Azure optimizes the iterative computations using a multi-level caching of data, a cache-aware decentralized task scheduling, hybrid tree-based data broadcasting and hybrid intermediate data communication. This paper presents the Twister4Azure iterative MapReduce runtime and a study of four real world data-intensive scientific applications implemented using Twister4Azure–two iterative applications, Multi-Dimensional Scaling and KMeans Clustering; and two pleasingly parallel applications, BLAST+ sequence searching and SmithWaterman sequence alignment. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks. We also study and present solutions to several factors that affect the performance of iterative MapReduce applications on Windows Azure Cloud.


► Twister4Azure is an iterative MapReduce framework optimized for Azure Cloud.
► Efficient easy to use scalable parallel computation can be performed using Twister4Azure.
► Twister4Azure features a light weight, distributed, decentralized architecture with fault tolerance.
► Four scientific applications, Multi-Dimensional Scaling, KMeans Clustering, BLAST+ and all pairs sequence alignment are implemented using Twister4Azure.
► Applications perform comparably or better than traditional MR frameworks (e.g. Hadoop).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 29, Issue 4, June 2013, Pages 1035–1048
نویسندگان
, , , ,