Analysis of NUMA effects in modern multicore systems for the design of high-performance data transfer applications

Article ID	Journal	Published Year	Pages	File Type
4950357	Future Generation Computer Systems	2017	10 Pages	PDF

Abstract

To capitalize on multicore power, modern high-speed data transfer applications usually adopt multi-threaded design and aggregate multiple network interfaces. However, NUMA introduces another dimension of complexity to these applications. In this paper, we undertook comprehensive experiment on real systems to illustrate the importance of NUMA-awareness to applications with intensive memory accesses and network I/Os. Instead of simply attributing the NUMA effect to the physical layout, we provide an in-depth analysis of underlying interactions inside hardware devices. We profile the system performance by monitoring relevant hardware counters, and reveal how the NUMA penalty occurs during prefetch and cache synchronization processes. Consequently, we implement a thread mapping module in a bulk data transfer software, BBCP, as a practical example of enabling NUMA-awareness. The enhanced application is then evaluated on our high-performance testbed with storage area networksÂ (SAN). Our experimental results show that the proposed NUMA optimizations can significantly improve BBCP's performance in memory-based tests with various contention levels and realistic data transfers involving SAN-based storage.

Keywords

input/output