کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432402 | 688881 | 2013 | 19 صفحه PDF | دانلود رایگان |

Recent advances in neuroscientific understanding have highlighted the highly parallel computation power of the mammalian neocortex. In this paper we describe a GPGPU-accelerated implementation of an intelligent learning model inspired by the structural and functional properties of the neocortex. Furthermore, we consider two inefficiencies inherent to our initial implementation and propose software optimizations to mitigate such problems. Analysis of our application’s behavior and performance provides important insights into the GPGPU architecture, including the number of cores, the memory system, atomic operations, and the global thread scheduler. Additionally, we create a runtime profiling tool for the cortical network that proportionally distributes work across the host CPU as well as multiple GPGPUs available to the system. Using the profiling tool with these optimizations on Nvidia’s CUDA framework, we achieve up to 60× speedup over a single-threaded CPU implementation of the model.
► Detailed investigation of the performance of the cortical network and proposed optimizations.
► Utilization/investigation of optimizations towards two other GPU applications.
► Detailed insight of GPU architecture details with regard to tuning GPU applications.
► First work to demonstrate profiling and distributing on heterogeneous GPU systems.
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 7, July 2013, Pages 953–971