Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4956763 | Microprocessors and Microsystems | 2017 | 19 Pages |
Abstract
In this paper we propose a shared memory, parametrizable architecture to compute the DP matrix for the Knapsack 0/1. Our system has a parallel runtime of Î(mC/q) for a knapsack of capacity C with m items and q operators. Using additional off-chip space and DMA transfers it can solve knapsacks of any size. The architecture is implemented on the ZYNQ-7020 System On Chip (SoC) that contains a dual-core ARM plus Artix FPGA fabric. Under such architecture we make use of 64-bit High Performance ports for off-chip transfers and asymmetric 32-bit write/64-bit read BRAMs to minimize data exchange times. We also exploit computation synchronization to minimize BRAM address propagation and reduce routing congestion. We present results for a base system with 70 Processing Elements (PEs) capable of solving problems with a maximum item weight Ïmax=1024. For more complex instances we configure the architecture with 58 PEs and Ïmax=6144, where a single BRAM is shared among 13 computing units. We thus solve problems with 6ââ¯Ã⯠bigger weights than previous works, attain a 16ââ¯Ã⯠speed-up versus an optimized software on an Intel Xeon E5 and get the highest efficiency per core versus other architectures. We achieve between 2.4â3.3à acceleration versus previous FPGA solutions.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Networks and Communications
Authors
Fernando A. Escobar, Anthony Kolar, Naim Harb, Filipe Vinci Dos Santos, Carlos Valderrama,