کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432446 | 688896 | 2012 | 8 صفحه PDF | دانلود رایگان |

Ethernet continues to be the most widely used network architecture today for its low cost and backward compatibility with the existing Ethernet infrastructure. Driven by increasing networking demands of cloud workloads, network speed rapidly migrates from 1 to 10 Gbps and beyond. Ethernet’s ubiquity and its continuously increasing rate motivate us to fully understand high speed network processing performance and its power efficiency.In this paper, we begin with per-packet processing overhead breakdown on Intel Xeon servers with 10 GbE networking. We find that besides data copy, the driver and buffer release, unexpectedly take 46% of the processing time for large I/O sizes and even 54% for small I/O sizes. To further understand the overheads, we manually instrument the 10 GbE NIC driver and OS kernel along the packet processing path using hardware performance counters (PMU). Our fine-grained instrumentation pinpoints the performance bottlenecks, which were never reported before. In addition to detailed performance analysis, we also examine power consumption of network processing over 10 GbE by using a power analyzer. Then, we use an external Data Acquisition System (DAQ) to obtain a breakdown of power consumption for individual hardware components such as CPU, memory and NIC, and obtain several interesting observations. Our detailed performance and power analysis guides us to design a more processing- and power-efficient server I/O architecture for high speed networks.
► Our per-packet processing overhead breakdown points out three major overheads.
► Our fine-grained instrumentation in OS locates bottlenecks and reveals several observations.
► We find that unlike 1 GbE NICs (∼1 W), 10 GbE NICs have almost 10 W idle power dissipation.
► Our measurement shows that network processing over 10 GbE has high power consumption.
► The power breakdown reveals that the major contributor is CPU, followed by main memory.
Journal: Journal of Parallel and Distributed Computing - Volume 72, Issue 11, November 2012, Pages 1442–1449