Article ID Journal Published Year Pages File Type
539630 Integration, the VLSI Journal 2015 11 Pages PDF
Abstract

•Our BTWC processor tolerates worst-case delays without impact on performance thanks to latency-insensitive design (LID) and variable-latency (VL) units.•A fine-grain LID pipeline interlock handles stalls caused by control and data hazards, late memory access, and VL execution.•Clock frequency increases by 23% in a 45-nm CMOS technology compared to the worst-case approach.•Variable-latency penalties occur just in the worst process corner. No performance degradation ensues in common and best-case corners.•A light-weight software built-in self-test (BIST) assigns two-cycle execution of critical instructions only when needed.

Variability of process parameters in nanometer CMOS circuits makes standard worst-case design methodology waste much of the advantages of scaling. A common-case design, though, is a perilous alternative, as it gives up much of the design yield. Better than worst-case (BTWC) design methodology reconciles performance and yield. In this paper we present a BTWC RISC processor that tolerates worst-case extra delays of critical paths without significant impact on the overall performance. We obtain this result by coupling latency-insensitive design and variable-latency (VL) units. A software built-in self-test checks VL units individually to determine whether to activate them or not. Compared to a worst-case approach, the RISC clock frequency increases by 23% in a 45 nm CMOS technology. The impact of VL on instructions per cycle is circumscribed to the worst process case only and very limited, as we show through a set of benchmarks.

Related Topics
Physical Sciences and Engineering Computer Science Hardware and Architecture
Authors
, ,