Article ID Journal Published Year Pages File Type
6885906 Microprocessors and Microsystems 2018 9 Pages PDF
Abstract
The fused-multiply-add (FMA) instruction is a common instruction in RISC processors since 1990. A 3-stage, 8-level pipelined, dual-precision FMA is proposed here that can perform operations either at one double precision (SISD) or at two single precision in parallel (SIMD). The 53-bit mantissa-multiplier (MM) is optimally segmented by Karatsuba-Offman (KO) algorithm such that both modes can be performed. The 6-stage pipelined MM uses only 6 of 10 multipliers and 13 of 33 adder/subtractors in SIMD. Thus hardware area of the proposed MM is reduced by 23.82% and throughput is maintained to be 923M samples/s. The arithmetic operational units in the data path are shared among the modes by having four data rearrangement units (DRU) which rearranges the data systematically at the input, the outputs of MM and the final output. Though these DRUs bring some hardware overhead, the resulting architecture is modular and uniform for both modes of computation. The proposed FMA has been implemented using TSMC 1P6M CMOS 130 nm library and takes 48% less overall area and consumes 49% less power at 308.7 MHz compared to previous results. The area-delay-product (ADP), 0.48 × 10−15 shows that the area optimization by proposed KO based MM can also keep the computation time as 3.24 ns.
Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , ,