کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432989 | 689190 | 2016 | 11 صفحه PDF | دانلود رایگان |
• Identify GPU programming challenges for high throughput DPI implementation.
• Develop an analytical GPU performance model to explore the design space.
• Present algorithm and implementation co-optimization techniques.
• Prototype a deep packet inspection engine onto GPU with up to 150 Gb/s throughput.
The Graphics Processing Unit (GPU) is a promising platform to implement Deep Packet Inspection (DPI) due to the GPU’s rich parallelism and programmability for high performance and frequent pattern update requirements. However, it is a great challenge to achieve a high performance implementation due to the GPU’s performance sensitivity to algorithm and implementation issues such as memory overhead, thread divergence, and large lookup table sizes. In this paper, we propose algorithm and implementation co-optimization techniques that achieve high performance by reducing required memory, removing thread divergence, optimizing memory access patterns, and optimizing for multithreading. To lower the implementation cost, a GPU performance model is developed to detect the bottlenecks and provide design direction for the GPU kernel. Based on these optimization techniques, a prototype implementation of DPI at 150 Gb/s is achieved on a single NVIDIA K20 GPU.
Journal: Journal of Parallel and Distributed Computing - Volume 88, February 2016, Pages 46–56