کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432363 688865 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Direct distributed memory access for CMPs
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Direct distributed memory access for CMPs
چکیده انگلیسی


• A direct distributed memory access (DDMA) model is proposed.
• Remote memory can be directly accessed via remote-to-local virtualization.
• NoC protocol translation can be eliminated in the DDMA model.
• Detailed architecture supports for the DDMA model is discussed.
• We show performance promotion of DDMA compared with the traditional PDMA model.

On-chip distributed memory has emerged as a promising memory organization for future many-core systems, since it efficiently exploits memory level parallelism and can lighten off the load on each memory module by providing a comparable number of memory interfaces with on-chip cores. The packet-based memory access model (PDMA) has provided a scalable and flexible solution for distributed memory management, but suffers from complicated and costly on-chip network protocol translation and massive interferences among packets, which leads to unpredictable performance. In this paper we propose a direct distributed memory access (DDMA) model, in which remote memory can be directly accessed by local cores via remote-to-local virtualization, without network protocol translation. From the perspective of local cores, remote memory controllers (MC) can be directly manipulated through accessing the local agent MC, which is responsible for accessing remote memory through high-performance inter-tile communication. We further discuss some detailed architecture supports for the DDMA model, including the memory interface design, work flow and the protocols involved. Simulation results of executing PARSEC benchmarks show that our DDMA architecture outperforms PDMA in terms of both average memory access latency and IPC by 17.8% and 16.6% respectively on average. Besides, DDMA can better manage congested memory traffic, since a reduction of bandwidth in running memory-intensive SPEC2006 workloads only incurs 18.9% performance penalty, compared with 38.3% for PDMA.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 2, February 2014, Pages 2109–2122
نویسندگان
, , ,