Article ID Journal Published Year Pages File Type
424574 Future Generation Computer Systems 2015 10 Pages PDF
Abstract

•RDMA needs independent memory management by the NIC in host/GPU virtual address space.•This requires fast lookup of buffers and Virtual-to-Physical address translation.•We developed an ASIP for the FPGA to accelerate these operations with good results.•ASIP design has been effective thanks to architecture exploration toolsuite.

We developed a point-to-point, low latency, 3D torus Network Controller integrated in an FPGA-based PCIe board which implements a Remote Direct Memory Access (RDMA) communication protocol. RDMA requires ability to directly access the remote node application memory with minimal OS or CPU intervention. To this purpose, a key element is the design of a direct memory writing mechanism to address the destination buffers; on Virtual Memory supporting OSes this corresponds to a number of page-segmented DMAs. To minimally affect overall performance, mechanisms with lowest possible latency are needed for either Virtual-to-Physical address translation and registered buffers list scanning. In a first implementation these tasks were set on a soft-core μC on the FPGA, leading to a 1.6  μμs latency to process a single packet and limiting the peak bandwidth. As a second trial, we present an accelerated version for these time-critical network functions exploiting an application-specific processor (ASIP) designed using a retargetable ASIP development toolsuite that allows architectural exploration. Benchmark results for Buffer Search and Virtual-to-Physical tasks on the ASIP show improvements for latency with up to ten times lower cycles cost compared with the soft-core μC.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , , , , , , , , , ,