Contents lists available at ScienceDirect







journal homepage: www.elsevier.com/locate/mejo

# Timing slack monitoring under process and environmental variations: Application to a DSP performance optimization

B. Rebaud<sup>a</sup>, M. Belleville<sup>a</sup>, E. Beigné<sup>a</sup>, C. Bernard<sup>a</sup>, M. Robert<sup>b</sup>, P. Maurine<sup>b,\*</sup>, N. Azemard<sup>b</sup>

<sup>a</sup> CEA, LETI, MINATEC campus, F38054 Grenoble Cedex, France <sup>b</sup> LIRMM, Montpellier, France

#### ARTICLE INFO

Article history: Received 27 August 2010 Received in revised form 1 February 2011 Accepted 8 February 2011 Available online 10 March 2011

Keywords: Variability Monitor Timing slack Process compensation

### ABSTRACT

To compensate the variability effects in advanced technologies, Process, Voltage, Temperature (PVT) monitors are mandatory to use Adaptive Voltage Scaling (AVS) or Adaptive Body Biasing (ABB) techniques. This paper describes a new monitoring system, allowing failure anticipation in real-time, looking at the timing slack of a pre-defined set of observable flip-flops. This system is made of dedicated sensor structures located near monitored flip-flop, coupled with a specific timing detection window generator, embedded within the clock-tree. Validation and performances simulated in a 45 nm low power technology, demonstrate a scalable, low power and low area system, and its compatibility with a standard CAD flow. Gains between an AVFS scheme based on those structures and a standard DVFS are given for a 32 bits VLIW DSP.

© 2011 Elsevier Ltd. All rights reserved.

## 1. Introduction

If the scaling of CMOS technologies has brought amazing integration capabilities, it has also recently led to a dramatic design margin increase [1] mainly explained by the increase of process variations and the conservatism of worst case analyses and CAD tools. To cope with this design margin increase, statistical techniques [2–4] have been identified as a mean to obtain better estimations and thus as a solution to reduce design margins, while maintaining a good yield.

Unfortunately statistical techniques may not be applied to take dynamic variations into account. As a result, dynamic variations like voltage and temperature variations [5,6] or ageing effects [7] are still taken into account considering worst case situations. To get around this limitation and further reduce design margins, two different approaches may be adopted.

The first one consists in integrating specific structures or sensors to monitor in real-time the physical and electrical parameters required to control dynamically the operating frequency and/or the supply voltage and/or the substrate biasing. Several process voltage temperature (PVT) sensors have been proposed n the literature [8-12] for global variability compensation. However, the use of such PVT sensors has some limitations.

Firstly their area and power consumption may be high and thus their number has to be limited. Secondly their use requires the integration of complex control functions in LUT and intensive characterizations of the chip behavior w.r.t. the considered PVT variables. Finally the use of Ring Oscillator (RO) structures [8,9,12] to monitor the circuit speed can be another limitation, since an RO may have sensitivities to PVT quite different than the behavior of real circuit datapaths. However, this can be partly overcome in adopting a replica path approach as proposed in [13]. It consists in monitoring the speed of some critical paths that are duplicated in the sensors to replace the traditional RO.

The second approach, to compensate PVT variations and ageing effects, is to monitor directly the sampling elements of the chip (latches or D-type flip-flop) to detect the occurrence of timing faults. This can be achieved either by inserting specific structures or using ad-hoc sampling elements [14-17] detecting the occurrence of a timing violation, by performing a delayed comparison or by detecting a signal transition within a given time window. The main advantage of this approach is its ability to detect the effects on timings of local and dynamic variations occurring in the close vicinity of the inserted specific structures. A second and significant advantage is the simple and binary sensor output.

However, this second approach has also some drawbacks. Indeed, an important number of sensors might be required to obtain a full coverage of the circuit. Thus, these structures must be as small as possible and consume a small energy when the circuit operates correctly. In addition, the detection of an error

<sup>\*</sup> Corresponding author. Tel.: +33 467418520; fax: +33 467418500. E-mail address: pmaurine@lirmm.fr (P. Maurine).

<sup>0026-2692/\$-</sup>see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2011.02.005

requires the full replay of the faulty data processing at a lower speed [14–17]; this may be an issue if the faulty data has been broadcasted to the rest of the chip, or if the data processing flow does not allow such interruption.

To cope with this issue, it has been proposed in [19,20,21] to anticipate the system failure by checking the monitored flip–flop timing slack by inserting a local prevention delay. If these solutions avoid replaying faulty computations, they do not allow detecting fast voltage drops or timing shifts. As a result, reduced timing margins must be considered. The implementation given in [19] triples the area of the sequential element while the observation window in [20] highly depends on the frequency and the implementation proposed in the pioneer work [21] is based on gate level implementation of the monitoring system, which is thus area consuming.

One may be aware that this concept does not allow catching fast dynamic variations whose effect would imply delay fluctuations greater than the prevention taken.

Within this context, the contribution of this paper is twofold: a new monitoring structure and its associated design flow. The monitoring system, in line with [15,19,20] concepts, aims at anticipating timing violations, induced by process, temperature and slow voltage drifts, over a wide range of operating conditions. This new timing slack monitor may allow the application of dynamic voltage and/or frequency scaling as well as body-biasing strategies, or, at least, provide valuable information to a global monitoring system integrating voltage, temperature and process sensors.

The proposed system detects locally critical timing slacks and monitors on the fly of their evolution with PVT variations or ageing phenomena such as NBTI or HCI [7]. One key feature of the system is the generation of the detection window. Indeed, contrarily to [20] that uses the falling edge of the clock, the detection window is generated by specific Clock-tree Cells (*CC*) directly integrated into the clock network. This solution allows fine tuning the position in time and the width of the detection window. Note however that the insertion of these CC must be done with specific care to avoid tedious iterations in the design flow and not to deteriorate the clock skew.

The rest of this paper is organized as follows. Section 2 gives an overview of the whole monitoring system. The two specific standard cells (the sensor cell and the Clock-tree cell) required to integrate such a monitoring system are then described in detail in Section 3. Section 4 introduces the integration flow that has been used to integrate the proposed monitoring system, including the CC insertion into the clock-tree. Section 5 gives some validation results related to the integration of the monitoring system in a 32 bits VLIW DSP designed in a 45 nm technology. A conclusion is finally drawn in Section 6.

#### 2. Monitoring system concept

Fig. 1 shows the proposed monitoring system, which is composed of two standard cell library elements: a sensor and a specific Clock-tree Cell (*CC*). The sensor is intended to be inserted close to a D-type flip–flops (*DFF*) located at the endpoints of critical timing paths of the design, while *CC* are inserted at the associated clock leaves.

The sensor, acting as a stability checker, is directly connected to a datapath output, i.e. to a DFF input. It also receives a pulse *CP* defining the observation window *DW*, of duration *dw*, provided periodically by *CC*. The edges of *CP* are in phase with *CLK\_DFF*, the flip–flop clock also generated within *CC*. The basic function of the sensor is to detect the occurrence of a full or partial transition of the signal  $In_A$  during this detection window. When this event occurs, the monitor latches a warning signal (i.e. *QN* switches from  $V_{dd}$  to 0).

To get the effective detection window  $DW_{eff}$ , of width  $dw_{eff}$ , internal propagation delays of the sensor have to be considered (Fig. 2).

In\_A-to-CP\_r and In\_A-to-CP\_f are the sensor internal delay differences between internal edges produced by In\_A and CP rising and falling edges, respectively. CP-to-CLK\_DFF is the delay between CP and CLK\_DFF rising edges. With such notations, the effective detection window starts at [In\_A-to-CP\_r+CP\_r-to-CLK\_DFF] before the rising edge of CLK\_DFF node and ends at [In\_A-to-CP\_f-CP\_f-to-CLK\_DFF] before the falling edge of CLK\_DFF. Thus the effective width  $dw_{eff}$  of the effective detection window DW<sub>eff</sub> is fixed by both the internal structure of the sensor and the timing characteristics of the CC element. More precisely the effective width  $dw_{eff} \approx (CP_r-to-CLK_DFF+CP_f-to-CLK_DFF)+(In_A-to-CP_r-In_A-to-CP_f)=dw+(In_A-to-CP_f-In_A-to-CP_f)$ . Note that at first order  $dw_{eff} \approx dw$ .

A key point here is that  $[In\_A-to-CP\_f-CP\_f-cCP\_f-cCK\_DFF]$  must be slightly greater than the setup-time *Tsetup* of the monitored DFF, or at least equal, if one expects detecting timing warnings rather than timing errors.

To take into account the uncertainties on the setup-time (*Tsetup*) estimations (obtained during the design steps), a design guard margin *Gm*, on the value of *Tsetup* can be added. In this case, if during the last clock cycle, the timing slack *Tm* (before occurrence of a setup-time violation) is lower than  $[In_A-to-CP_r+CP_r-to-CLK_DFF-Tsetup-Gm]$  and greater than  $[In_A-to-CP_f-CP_f-to-CLK_DFF-Tsetup-Gm]$ , the



Fig. 1. Monitor system implemented on a path.

Download English Version:

# https://daneshyari.com/en/article/543531

Download Persian Version:

https://daneshyari.com/article/543531

Daneshyari.com