Article ID Journal Published Year Pages File Type
6486824 Computational Biology and Chemistry 2018 14 Pages PDF
Abstract
To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.
Related Topics
Physical Sciences and Engineering Chemical Engineering Bioengineering
Authors
, , , ,