کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4950538 1440647 2017 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Scalable and efficient data distribution for distributed computing of all-to-all comparison problems
ترجمه فارسی عنوان
توزیع داده های مقیاس پذیر و کارآمد برای محاسبات توزیع شده از تمام مشکلات مقایسه
کلمات کلیدی
محاسبات توزیع شده، اطلاعات بزرگ، همه به همه مقایسه، توزیع داده،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
All-to-all comparison problems represent a class of big data processing problems widely found in many application domains. To achieve high performance for distributed computing of such problems, storage usage, data locality and load balancing should be considered during the data distribution phase in the distributed environment. Existing data distribution strategies, such as the Hadoop one, are designed for problems with MapReduce pattern and do not consider comparison tasks at all. As a result, a huge amount of data must be re-arranged at runtime when the comparison tasks are executed, degrading the overall computing performance significantly. Addressing this problem, a scalable and efficient data distribution strategy is presented in this paper with comparison tasks in mind for distributed computing of all-to-all comparison problems. Specifically designed for problems with all-to-all comparison pattern, it not only saves storage space and data distribution time but also achieves load balancing and good data locality for all comparison tasks of the all-to-all comparison problems. Experiments are conducted to demonstrate the presented approaches. It is shown that about 90% of the ideal performance capacity of the multiple machines can be achieved through using the approach presented in this paper.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 67, February 2017, Pages 152-162
نویسندگان
, , , ,