| Article ID | Journal | Published Year | Pages | File Type |
|---|---|---|---|---|
| 424574 | Future Generation Computer Systems | 2015 | 10 Pages |
•RDMA needs independent memory management by the NIC in host/GPU virtual address space.•This requires fast lookup of buffers and Virtual-to-Physical address translation.•We developed an ASIP for the FPGA to accelerate these operations with good results.•ASIP design has been effective thanks to architecture exploration toolsuite.
We developed a point-to-point, low latency, 3D torus Network Controller integrated in an FPGA-based PCIe board which implements a Remote Direct Memory Access (RDMA) communication protocol. RDMA requires ability to directly access the remote node application memory with minimal OS or CPU intervention. To this purpose, a key element is the design of a direct memory writing mechanism to address the destination buffers; on Virtual Memory supporting OSes this corresponds to a number of page-segmented DMAs. To minimally affect overall performance, mechanisms with lowest possible latency are needed for either Virtual-to-Physical address translation and registered buffers list scanning. In a first implementation these tasks were set on a soft-core μC on the FPGA, leading to a 1.6 μμs latency to process a single packet and limiting the peak bandwidth. As a second trial, we present an accelerated version for these time-critical network functions exploiting an application-specific processor (ASIP) designed using a retargetable ASIP development toolsuite that allows architectural exploration. Benchmark results for Buffer Search and Virtual-to-Physical tasks on the ASIP show improvements for latency with up to ten times lower cycles cost compared with the soft-core μC.
