کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523830 868503 2016 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A dynamic block-level execution profiler
ترجمه فارسی عنوان
پروفیل اجرا در سطح بلوک
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We introduce a hardware-based mechanism to dynamically profile application blocks.
• Profiling information is used to prioritize critical memory loads during execution.
• Our mechanism yields better accuracy and performance gains than previous proposals.
• We extensively analyze how our mechanism improves performance.
• Results show that it alleviates prefetch inter-core interference.

Most performance enhancing mechanisms in current processors, such as branch predictors or prefetchers, rely on program characteristics monitored at the granularity of single instructions. However, many of these characteristics can be obtained at the basic block-level instead. The coarser granularity allows a larger portion of the code to be examined, enabling a more accurate profiling and a detailed analysis of the different types of instructions executed within a block. Therefore, block-level analysis can be advantageous for performance enhancing mechanisms, as it allows us to look at how the instructions influence each other, and thus detect complex behavior patterns.In this paper, we present the Dynamic Block-Level Execution Profiler (DBLEP), a basic block level online mechanism that profiles micro-architectural bottlenecks, such as delinquent memory loads, hard-to-predict branches and contention for functional units. DBLEP operates at the basic block level and provides information that can be used to reduce the impact of these bottlenecks. A prefetch dropping scheme and a memory controller policy were developed to use the code profiling information provided by DBLEP. By taking advantage of the high profiling accuracy, these mechanisms are able to improve the processor’s performance by up to 18.6% (5.3% on average). We show that our mechanism’s performance is comparable to mechanisms that work on single instruction granularity, using less hardware.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 54, May 2016, Pages 15–28
نویسندگان
, , , , ,