کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523862 | 868508 | 2016 | 9 صفحه PDF | دانلود رایگان |
• We describe a novel model for executing distributed memory parallel programs using uncoordinated tasks.
• We describe several off-line optimizations for the proposed model.
• We examine the effects of these optimizations on modern processors with wider vector units.
• Increasing levels of task coalescence can improve throughput and increase performance.
• Increases in performance are observed in both single node and multi node experiments.
We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed model, including autonomous dataflow task selection. We also describe a set of optimization strategies which improve overall throughput of stencil programs executed using this model on modern multi-core and vectorized architectures.
Journal: Parallel Computing - Volume 51, January 2016, Pages 79–87