Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6486824 | Computational Biology and Chemistry | 2018 | 14 Pages |
Abstract
To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.
Keywords
Related Topics
Physical Sciences and Engineering
Chemical Engineering
Bioengineering
Authors
Ernst Joachim Houtgast, Vlad-Mihai Sima, Koen Bertels, Zaid Al-Ars,