Scalable shared-memory architecture to solve the Knapsack 0/1 problem

Article ID	Journal	Published Year	Pages	File Type
4956763	Microprocessors and Microsystems	2017	19 Pages	PDF

Abstract

In this paper we propose a shared memory, parametrizable architecture to compute the DP matrix for the Knapsack 0/1. Our system has a parallel runtime of Î(mC/q) for a knapsack of capacity C with m items and q operators. Using additional off-chip space and DMA transfers it can solve knapsacks of any size. The architecture is implemented on the ZYNQ-7020 System On Chip (SoC) that contains a dual-core ARM plus Artix FPGA fabric. Under such architecture we make use of 64-bit High Performance ports for off-chip transfers and asymmetric 32-bit write/64-bit read BRAMs to minimize data exchange times. We also exploit computation synchronization to minimize BRAM address propagation and reduce routing congestion. We present results for a base system with 70 Processing Elements (PEs) capable of solving problems with a maximum item weight Ïmax=1024. For more complex instances we configure the architecture with 58 PEs and Ïmax=6144, where a single BRAM is shared among 13 computing units. We thus solve problems with 6ââ¯Ãâ¯ bigger weights than previous works, attain a 16ââ¯Ãâ¯ speed-up versus an optimized software on an Intel Xeon E5 and get the highest efficiency per core versus other architectures. We achieve between 2.4â3.3Ã acceleration versus previous FPGA solutions.

Keywords

Dynamic programming shared memory SOC FPGA