کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
524054 868548 2014 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A Generate-Test-Aggregate parallel programming library for systematic parallel programming
ترجمه فارسی عنوان
یک کتابخانه برنامه نویسی موازی مولد-تست-مجموعه ای برای برنامه نویسی موازی سیستماتیک
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We implement a language library (in Scala) for systematic parallel programming.
• This library provides a Generate-Test-Aggregate DSL as the programming interface.
• The GTA programs are automatically transformed to efficient MapReduce programs.
• Our library can significantly decrease the difficulties in parallel programming.
• Evaluation of various GTA examples shows high performance and good scalability.

The Generate-Test-Aggregate (GTA for short) algorithm is modeled following a simple and straightforward programming pattern, for combinatorial problems. First, generate all candidates; second, test and filter out invalid ones; finally, aggregate valid ones to make the final result. These three processing steps can be specified by three building blocks namely, generator, tester, and aggregator. Despite the simplicity of algorithm design, implementing the GTA algorithm naively following the three processing steps, i.e., brute-force, will result in an exponential-cost computation, and thus it is impractical for processing large data. The theory of GTA illustrates that if the definitions of generator, tester, and aggregator satisfy certain conditions, an efficient (usually near-linear cost) MapReduce program can be automatically derived from the GTA algorithm.The principle of GTA is attractive but how to make it being practically useful, remains as an important and challenge problem due to the complexity of GTA program transformations. In this paper, we report on our studying and implementation of a practical GTA library (written in the functional language Scala) which provides a systematic parallel programming approach for big-data analysis with MapReduce. The library provides a simple functional style programming interface and hides all the internal transformations. With this library, users can write parallel programs in a sequential manner in terms of the GTA algorithm, and the efficiency of the generated MapReduce programs is guaranteed systematically. Therefore, parallel programming for many problems could become no more a tough job. We demonstrate the usefulness of our GTA library on some interesting problems involving large data and show that lots of applications can be easily and efficiently solved by using our library.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 40, Issue 2, February 2014, Pages 116–135
نویسندگان
, , ,