کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523902 868525 2014 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads
چکیده انگلیسی


• We propose a new thread synchronization heuristic called Min-SP/PC.
• Min-SP/PC handles function calls better than previous algorithms.
• Many instructions in SPMD programs are identical across threads.
• Many memory accesses are either uniform or affine across threads.

Simultaneous Multi-Threading (SMT) is a hardware model in which different threads share the same processing unit. This model is a compromise between high parallelism and low hardware cost. Minimal Multi-Threading (MMT) is one architecture recently proposed that shares instruction decoding and execution between threads running the same program in an SMT processor, thereby generalizing the approach followed by Graphics Processing Units to general-purpose processors. In this paper we propose new ways to expose redundancies in the MMT execution model. First, we propose and evaluate a new thread reconvergence heuristic that handles function calls better than previous approaches. Our heuristic only inspects the program counter and the stack frame to reconverge threads; hence, it is amenable to efficient and inexpensive hardware implementation. Second, we demonstrate that this heuristic is able to reveal the existence of substantial regularity in inter-thread memory access patterns. We validate our results on data-parallel applications from the PARSEC and SPLASH suites. Our new reconvergence heuristic increases the throughput of our MMT model by 7%, when compared to a previous, and substantially more complex approach, due to Long et al. Moreover, it gives us an effective way to increase regularity in memory accesses. We have observed that over 70% of simultaneous memory accesses are either the same for all the threads, or are affine expressions of the thread identifier. This observation motivates the design of newly proposed hardware that benefits from regularity in inter-thread memory accesses.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 40, Issue 9, October 2014, Pages 548–558
نویسندگان
, , , , ,