کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432814 689083 2011 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture
چکیده انگلیسی

Parallel workloads on shared-memory multi-core processors often suffer from performance degradation. Cache eviction, true/false sharing and bus contention are among the well-understood causes to this problem. This paper presents a study that shows the L2 DPL (data prefetch logic) in processors based on Intel Core microarchitecture can be a cause to this problem as well. The study through a case of an image integration finds the nonscaling problem on the parallel integration of images whose size exceeds the capacity of the processor’s L2 cache. Through an analysis on relevant performance events using Intel VTune™Performance Analyser the L2 DPL prefetch is found less effective over the parallel integration in prefetching needed data than over the serial ones. To resolve the problem a novel parallel image reverse loading is developed with the purpose of reducing the number of memory accesses over the parallel integration and the associated delay. Experimental results demonstrate that the parallel integration after the parallel reverse loading shows significant speedup against the same parallel integration but after serial loading.


► The L2 DPL in Intel Core architecture can be ineffective for parallel workloads.
► Because of an increase in the number of prefetches being delayed by demands.
► Algorithms increasing the number of reused cache lines may overcome the problem.
► A parallel reverse image loading is developed as such an example.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 71, Issue 7, July 2011, Pages 915–924
نویسندگان
,