کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523859 | 868508 | 2016 | 10 صفحه PDF | دانلود رایگان |

• We describe the key details of our data-oriented extensions to the Callgrind profiler.
• We introduce a novel technique for data partitioning targeting performance.
• We demonstrate the applicability of our methodology based on cache miss histograms.
• We showcase the potential benefits of our proposal.
Profiling is of great assistance in understanding and optimizing an application’s behavior. Today’s profiling techniques help developers focus on the pieces of code leading to the highest penalties according to a given performance metric. In this paper we describe a profiling tool we have developed by extending the Valgrind framework and one of its tools: Callgrind. Our extended profiling tool provides new object-differentiated profiling capabilities that help software developers and hardware designers (1) understand access patterns, (2) identify unexpected access patterns, and (3) determine whether a particular memory object is consistently featuring a troublesome access pattern. We use this tool to assist in the partition of big data objects so that smaller portions of them can be placed in small, fast memory subsystems of heterogeneous memory systems such as scratchpad memories. We showcase the potential benefits of this technique by means of the XSBench miniapplication from the CESAR codesign project. The benefits include being able to identify the optimal portion of data to be placed in a small scratchpad memory, leading to more than 19% performance improvement, compared with nonassisted partitioning approaches, in our proposed scratchpad-equipped compute node.
Journal: Parallel Computing - Volume 51, January 2016, Pages 46–55