کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523995 | 868541 | 2013 | 17 صفحه PDF | دانلود رایگان |
![عکس صفحه اول مقاله: Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing](/preview/png/523995.png)
• We address the discrepancy between the channel width and the packet size for the first time.
• A novel on-chip router architecture is proposed that employs bandwidth-stealing and buffer-stealing.
• Detailed experiments demonstrate that the proposed router lowers the zero-load latency and enhances the throughput.
• The hardware synthesis analysis verifies the modest area overhead of the proposed router over a conventional design.
Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth “sharding” (i.e., partitioning) and stealing in order to mitigate the elevation in the zero-load latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design.
Journal: Parallel Computing - Volume 39, Issue 9, September 2013, Pages 372–388