Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
542683 | Integration, the VLSI Journal | 2016 | 14 Pages |
•Configurable interleaving network supports any 2i-input (0≤i≤n)(0≤i≤n) network. A new low complexity address generator for interleaving.•Optimized decoding schedule reduces performance loss caused by parallel turbo decoding.•Configurable memory architecture is proposed to avoid memory access contention. Dual-mode ACS unit for both radix-2 and radix-4 processing.•The proposed parallel turbo decoder supports all 188 block sizes in LTE.
In this paper, a high performance parallel turbo decoder is designed to support 188 block sizes in the 3rd generation partnership (3GPP) long term evolution (LTE) standard. A novel configurable quadratic permutation polynomial (QPP) multistage network and address generator are proposed to reduce the complexity of interleaving. This 2n-input network can be configured to support any 2i-input (0≤i≤n)(0≤i≤n) network. Furthermore, it can flexibly support arbitrary contention-free interleavers by cascading an additional specially designed network. In addition, an optimized decoding schedule scheme is presented to reduce the performance loss caused by high parallelism. Memory architecture and address mapping method are optimized to avoid memory access contention of small blocks. Moreover, a dual-mode add–compare–select (ACS) unit implementing both radix-2 and radix-4 recursion is proposed to support the block sizes that are not divided by 16. Implemented in 130 nm CMOS technology, the design achieves 384.3 Mbps peak throughput at clock rate of 290 MHz with 5.5 iterations. Consuming 4.02 mm2 core area and 716 mW power, the decoder has a 1.81 bits/cycle/iteration/mm2 architecture efficiency and a 0.34 nJ/bit/iteration energy efficiency, which is competitive with other recent works.