کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
462994 | 696939 | 2015 | 12 صفحه PDF | دانلود رایگان |

Emerging integrated CPU + FPGA hybrid platforms, such as the Extensible Processing Platform architecture from Xilinx [1], offer unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memory-intensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism remains to be a key challenge. To address this problem, we propose a new FPGA-based embedded computer architecture, ASTRO (Application-Specific Hardware Traces with Reconfigurable Optimization). Our main contribution is the development of an integrated methodology that focuses on how to construct an application-specific memory access network capable of extracting the maximum amount of memory-level parallelism on a per-application basis. In particular, our proposed ASTRO architecture can (1) perform dynamic memory analysis to maximally extract the target application’s instruction, loop and memory-level parallelism for performance enhancement, (2) synthesize highly efficient accelerators that enable parallelized memory accesses, and therefore (3) accomplish effective data orchestration by utilizing the capabilities of modern FPGA devices: abundant distributed block RAMs and reprogrammability.To empirically validate our ASTRO methodology, we have implemented a baseline embedded processor platform, a conventional CPU + accelerator with a centralized single memory, and a prototype ASTRO machine based on Xilinx MicroBlaze technology. Our experimental results show that on average for 10 benchmark applications from SPEC2006 and MiBench [2], the ASTRO machine achieves 8.6 times speedup compared to the baseline embedded processor platform and 1.7 times speedup compared to a conventional CPU + accelerator platform. More interestingly, the ASTRO platform achieves more than 40% reduction in energy-delay product compared to a conventional CPU + accelerator with a centralized memory.
Journal: Microprocessors and Microsystems - Volume 39, Issue 7, October 2015, Pages 553–564