Article ID Journal Published Year Pages File Type
4956763 Microprocessors and Microsystems 2017 19 Pages PDF
Abstract
In this paper we propose a shared memory, parametrizable architecture to compute the DP matrix for the Knapsack 0/1. Our system has a parallel runtime of Θ(mC/q) for a knapsack of capacity C with m items and q operators. Using additional off-chip space and DMA transfers it can solve knapsacks of any size. The architecture is implemented on the ZYNQ-7020 System On Chip (SoC) that contains a dual-core ARM plus Artix FPGA fabric. Under such architecture we make use of 64-bit High Performance ports for off-chip transfers and asymmetric 32-bit write/64-bit read BRAMs to minimize data exchange times. We also exploit computation synchronization to minimize BRAM address propagation and reduce routing congestion. We present results for a base system with 70 Processing Elements (PEs) capable of solving problems with a maximum item weight ωmax=1024. For more complex instances we configure the architecture with 58 PEs and ωmax=6144, where a single BRAM is shared among 13 computing units. We thus solve problems with 6  ×  bigger weights than previous works, attain a 16  ×  speed-up versus an optimized software on an Intel Xeon E5 and get the highest efficiency per core versus other architectures. We achieve between 2.4−3.3× acceleration versus previous FPGA solutions.
Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , , ,