

Contents lists available at ScienceDirect

# INTEGRATION, the VLSI journal



# Clock buffer polarity assignment under useful skew constraints

### Deokjin Joo, Taewhan Kim\*

Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea

## ARTICLE INFO

Keywords: Clock trees Setup/hold times Useful clock skew Timing Clock polarity assignment Power/ground noise

## ABSTRACT

Clock trees, which deliver the clock signal to every clock sink in the whole system, switch actively at high frequency, which makes them one of the most dominant sources of noise. While many clock polarity assignment (PA) techniques were proposed to mitigate the clock noise, no attention has been paid to the PA under useful skew constraints. In this work, we show that the clock PA problem under useful skew constraints is intractable and propose a comprehensive and scalable clique search based algorithm to solve the problem effectively. In addition, we demonstrate the applicability of our solution by extending it for PA under delay variation environment. Through experiments with ISPD'10 benchmark circuits, we show that our proposed clock PA algorithm is able to reduce the peak noise by 10.9% further over that of the conventional global skew bound constrained PA. Finally, we compare our PA technique against decoupling capacitor embedding technique which is a commonly used method for noise reduction.

#### 1. Introduction

The rapid advancement in CMOS technology scaling has enabled the development of high performance and highly integrated chips. However, the increased power density requires the scaling of supply voltages to keep the power consumption under budget. This scaling then leads to the decrease of the noise margins, causing circuits to be more susceptible to power/ground noise. Power/ground noise is caused by simultaneous switching of circuits as they draw/drain current from/to the power/ground rails, inducing voltage fluctuations. Especially in synchronous high speed circuits, clock network is a major source of the noise, where its clock buffers switch simultaneously at high frequency, at the rising and falling edges of the clock signal. This noise adversely affects not only the signal integrity of chip but also the circuit performance due to the voltage drop/rise at the power/ground supply voltage rails [1]. To mitigate this problem, several techniques have been developed, including decoupling capacitor insertion, clock skew scheduling and polarity assignment (PA).

Decoupling capacitors (decaps) are the most popular and straight forward method for reducing power supply noise. This technique, which has been in use for over 40 years [2], is achieved by intentionally placing a large capacitor in the power distribution network. Although a powerful technique, decaps incur large area overhead.

Clock skew scheduling is a technique for improving circuit robustness, which is sometimes referred to as *useful skew scheduling*. This is done by deliberately introducing clock signal arrival time differences at the clock sinks to meet certain goals that the designer sets.<sup>1</sup> Fishburn [3] borrowed time from paths with more time slack for more critical paths to improve circuit performance. Wang et al. [4] proposed to utilize clock skew to improve timing yield. There are several works (e.g., [5-9]) that have utilized the clock skew to reduce simultaneous switching noise by spreading the peak current over time domain. Benini et al. [5] firstly proposed to reduce the peak current through clock skew scheduling. Vittal et al. formulated and solved the problem as 0–1 integer linear program. Lam et al. [7] proposed a graph based approach. Later, Huang et al. [8] extended the problem to consider multi-domain clock systems. Most recently, Vijayakumar et al. [9] proposed a fast heuristic method that can find a near-optimal solution within minutes on large circuits.

INTEGRATION

CrossMark

On the other hand, clock polarity assignment (PA) techniques provide another means to disperse noise [10-14]. Those techniques involve replacing some of the buffers in the clock trees to inverters, thus changing the *polarity* of the clock signal. Then, to compensate for the flip-flops (FFs) which are affected by the introduction of inverters, they are replaced with negative-edge triggered FFs. Fig. 1 illustrates the idea behind the clock PA. Buffers, which are constructed by cascading two differently sized inverters, draw larger current from the power rail at the rising edge of the clock signal compared to the falling edge, as illustrated in Fig. 1(a). Inverters on the other hand behave oppositely, as shown in Fig. 1(b). Consequently, by mixing buffers and inverters in a clock tree, the designer is able to divert the timing of the switching current. We divide the time period into many intervals, which are

http://dx.doi.org/10.1016/j.vlsi.2016.11.007

Received 26 April 2016; Received in revised form 29 July 2016; Accepted 24 November 2016 Available online 29 November 2016 0167-9260/ © 2016 Published by Elsevier B.V.

<sup>\*</sup> Corresponding author.

E-mail address: tkim@ssl.snu.ac.kr (T. Kim).

<sup>&</sup>lt;sup>1</sup> (Global) *clock skew* is defined to the difference between the latest and the earliest clock signal arrival times at the clock sinks.



Fig. 1. The idea behind clock buffer polarity assignment. Mixing buffers and inverters in a clock network enables to spread  $I_{\rm DD}/I_{\rm SS}$  current over the time period. (a) Buffers draw larger  $I_{DD}/I_{SS}$  current at rising/falling edge of clock signal. (b) Inverters exhibit opposite behavior of (a), (c) Peak noise occurs around the time when the leaf buffers (blue color) switch. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

called *time sampling slots* in this work, and take the maximum  $I_{DD}/I_{SS}$ current values in each slot to calculate the upper bound of the noise values. P+/P- are used to denote the peak current values of  $I_{DD}$  at the rising/falling edge of the clock, as shown in Fig. 1.

Nieh et al. [10] firstly proposed the idea of PA. They divided the clock tree into two subtrees and replaced the root driving buffer in one of the subtrees with an inverter, assigning polarity onto half of the clock tree. Although this reduced the noise for the whole chip, the buffers in each subtree still switched simultaneously that locally, the noise still remained as a problem. To address this issue, Samanta et al. [11] proposed to mix buffer and inverters throughout the clock tree, so that roughly half of the clock buffering elements are inverters. While this reduced the clock noise significantly, this increased the clock skew. Chen, Ho, and Hwang [12] focused the PA on leaf buffering elements. They observed that leaf buffers, which are directly incident to sinks (FFs), outnumber the non-leaf buffers, making leaves the dominant source of noise emitted by the clock tree. (See Fig. 1(c).) Hence, by assigning polarity only to the leaves, it was possible to reduce the noise while minimally impacting the clock skew. Jang et al. [15] proposed an integrated approach to the PA combined with buffer sizing to utilize the clock skew and further reduce the noise. Lu and Taskin [16] attempted to assign polarity to non-leaf at the expense of the increase of clock

skew. Later, in [13] they proposed to perform skew tuning on the polarity assigned clock trees to reduce the clock skew at the worst corner. Joo and Kim [14] proposed a method for better estimation of noise by fine-grained sampling on the noise current waveform. Kang and Kim [17] considered the delay variations in the PA. They performed PA which minimizes the power/ground noise while meeting the skew yield constraint caused by the clock skew variation. Recent researches show that polarities may be adjusted after chip fabrication by using XOR gates and double edge triggered flip-flops, which makes clock-gating-mode aware noise reduction possible [18,19].

While there are plenty of research works that addressed the PA problem, one common feature of all previous works is that they are all global clock skew bounded approach. However, for high performance circuits, it is necessary to set a tight clock skew bound since the available time margin is not enough. This means that it becomes much harder to exploit the clock PA under the tight clock skew bound constraint to minimize the worst noise. In contrast, the clock PA under useful skew constraints will be more effective than the global clock skew bound constrained PA in the sense that it is able to check the setup and hold time constraints between sinks *individually* in the course of the PA where some sink pairs have loose time margins while some have tight ones. To the authors' knowledge, this is the first work to assign clock polarity under useful clock skew constraints. In this work, we focus on the leaf clock buffering elements only and propose a comprehensive solution to the problem of clock PA integrated with buffer/inverter sizing to reduce clock switching noise (a preliminary version of this paper was presented in [20]). Precisely, (1) we show the PA problem under useful skew constraints is NP-complete; (2) we propose a clique search based scalable algorithm that is able to tradeoff between the solution quality and run time; (3) the proposed algorithm produces library based (practical) solution, so that the optimized buffers and inverters can be taken from the given library: (4) we demonstrate that our method can be effectively extended to the PA under delay variation environment; and (5) we compare the effectiveness of our PA technique against the decap embedding technique. Also, we observe that decap embedding and PA can be applied without conflict.

#### 2. Motivational example

2

4

3

2

2

Consider a small clock tree shown in Fig. 2(a). It has four clock sinks which are labeled as DFF<sub>0</sub>, ..., DFF<sub>3</sub>, each of which is driven by its distinct leaf clock buffer, as indicated by  $n_0, ..., n_3$ . The initial clock signal arrival times to DFF<sub>0</sub> through DFF<sub>3</sub> are 15, 11, 11, and 11, respectively, as indicated by  $t_0$ ,  $t_1$ ,  $t_2$ , and  $t_3$ . Assume that the setup and hold time constraints are pre-calculated and given as in Fig. 2(b). Given



 $\leq \leq \leq$ -5  $t_1 - t_2$ -3  $t_0$  $-t_3$  $|\leq|\leq|$ -4  $t_3$  $-t_2$ -3  $t_2$  $-t_0$ (b) Setup and hold time constraints

 $t_0$  $t_1$ 

| Туре | $\Delta t$ | Noise |    |
|------|------------|-------|----|
|      |            | P+    | P- |
| B1   | 0          | 10    | 3  |
| B2   | +2         | 12    | 3  |
| I1   | 0          | 3     | 9  |
| 12   | +1         | 3     | 11 |

Cell Type Selection Clock Noise # Skew P+ P-Worst  $n_0$  $n_1$  $n_2$  $n_3$ 12 **B**2 **B**1 **B**2 **B**2 46 46 1 2 2 B1 B2 B2 I2 3 37 20 37 3 3 **B**1 B2 I2 B2 37 20 37 4 **B1 B2** I2 I2 3 28 28 28  $\frac{2}{3}$ <u>5</u> <u>I1</u> **B**2 <u>B2</u> <u>B2</u> 39 18 39 6 I1 B2 B2 12 30 26 30 3 7 I1 B2 I2 B2 30 26 30 8 3 I1 B2 I2 I2 21 34 34 (d) Eight feasible PA with sizing

(a) Clock design with four clock buffers

(c) Library of buffers/inverters

Fig. 2. An illustration of clock buffer PA problem. (a) A small clock tree with four clock buffers and four sinks (FFs). (b) Setup and hold time constraints between the sinks. (c) Available types of buffer/inverter and their impact on clock signal arrival times ( $\Delta t$ ). Our brute force analysis with given information reveals that of 4<sup>4</sup> = 256 search space, only eight are *feasible* assignments which cause no time constraint violation. (d) The eight feasible clock PA of the design in (a) using the library in (c) that satisfies the time constraints in (b). Assignment #4 leads to the lowest peak noise while the conventional clock skew bound constrained PA finds assignment #5 as the least peak noise, which is 39% higher than that in assignment #4.

Download English Version:

# https://daneshyari.com/en/article/4970713

Download Persian Version:

https://daneshyari.com/article/4970713

Daneshyari.com