کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
431812 | 688634 | 2013 | 11 صفحه PDF | دانلود رایگان |

• We model wimpy-multicore and brawny-multicore based on the closed queuing network.
• We explore a large design space along multiple dimensions.
• There is an optimal core number at which the highest throughput is achieved.
• If workload is memory-bound with cache, the brawny-core system is much better.
In this paper, we conduct a coarse-granular comparative analysis of wimpy (i.e., simple) fine-grain multicore processors against brawny (i.e., complex) simultaneous multithreaded (SMT) multicore processors for server applications with strong request-level parallelism. We explore a large design space along multiple dimensions, including the number of cores, the number of threads, and a wide range of workloads.For strong CPU-bound workload, a 2R-core wimpy-multicore processor is found to be on par with an R-core brawny-multicore processor in terms of throughput performance. For strong memory-bound workload, core-level multithreading is largely ineffective for both wimpy-multicore and brawny-multicore processors, except for the case of low core and thread counts per memory/disk interface. For both wimpy-multicore and brawny-multicore, there is an optimal core number at which the highest throughput performance is achieved, which reduces, as the workload becomes deeper memory-bound. Moreover, there is a threshold core number for a wimpy-multicore, beyond which it is outperformed by its brawny-multicore counterpart. These behaviors indicate that brawny-multicores are better choices than wimpy-multicores in terms of throughput performance.
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 10, October 2013, Pages 1351–1361