Analyzing the performance of SMP memory allocators with iterative MapReduce applications

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
524648	868800	2013	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Scalability - مقیاس پذیری MapReduce - نگاشت کاهش

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Analyzing the performance of SMP memory allocators with iterative MapReduce applications

چکیده انگلیسی

• Analysis of memory allocators on SMPs with up to 512 cores.
• We measured NUMA traffic to quantify how well allocators preserve memory locality.
• Sixfold speedup with a basic custom allocator on top of the stock one.
• Optimized MapReduce framework for large shared-memory machines.
• We verified the SMP results with an MPI/OpenMP implementation on the same hardware.

The standard memory allocators of shared memory systems (SMPs) often provide poor performance, because they do not sufficiently reflect the access latencies of deep NUMA architectures with their on-chip, off-chip, and off-blade communication. We analyze memory allocation strategies for data-intensive MapReduce applications on SMPs with up to 512 cores and 2 TB memory. We compare the efficiency of the MapReduce frameworks MR-Search and Phoenix++ and provide performance results on two benchmark applications, k-means and shortest-path search.Already on small SMPs with 128 cores a 6-fold speedup can be achieved by replacing the standard glibc by allocators with pooling strategies. These savings become more pronounced on larger SMPs. We identify two types of overhead: (1) the cost for executing the malloc/free operations and (2) the poor memory locality caused by an ineffective mapping to the underlying memory hierarchy. We give detailed results on the NUMA traffic and show how the cost increases on large SMPs with many cores and a deep NUMA hierarchy.For verification, we run hybrid MPI/OpenMP implementations of the same benchmarks on systems with explicit message passing. The results reveal that neither the hardware nor the Linux kernel constitutes a bottleneck, but only the poor locality of the allocated memory pages.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 39, Issue 12, December 2013, Pages 879–889

نویسندگان

Alexander Reinefeld, Robert Döbbelin, Thorsten Schütt,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Analyzing the performance of SMP memory allocators with iterative MapReduce applications

دسترسی سریع

ارتباط

English Website