

Contents lists available at SciVerse ScienceDirect

### Microprocessors and Microsystems

journal homepage: www.elsevier.com/locate/micpro



# Ultra low energy design exploration of digital decimation filters in 65 nm dual- $V_T$ CMOS in the sub- $V_T$ domain

S.M. Yasser Sherazi\*, Joachim N. Rodrigues, Omer C. Akgun, Henrik Sjöland, Peter Nilsson

Department of Electrical and Information Technology, Lund University, Box 118, SE-221 00 Lund, Sweden

#### ARTICLE INFO

Article history:
Available online 13 April 2012

Keywords:
Energy dissipation
Ultra low power
Decimation filters
Half band filters
65 nm
Sub-threshold
CMOS
Unfolding
Wireless devices
Implantable devices

#### ABSTRACT

This paper presents an analysis of energy dissipation of a decimation filter chain of four Half Band Digital (HBD) filters operated in the sub-threshold (sub- $V_T$ ) region with throughput constraints. To combat speed degradation due to scaling of supply voltage, various HBD filters are implemented as unfolded structures. The designs are synthesized in 65 nm CMOS technology with low-power and three threshold options, both as single- $V_T$  and as dual- $V_T$ . A sub- $V_T$  energy model is applied to characterize the designs in the sub- $V_T$  domain. Simulation results show that the unfolded by two and four architectures are the most energy efficient for throughput requirements between 250 k samples/s, and 2 M samples/s. By the selection of optimum architectures and standard cells, at the required throughput the simulated minimum energy dissipation for the required throughput per output sample is 164 fJ and 205 fJ, with single supply voltage of 260 mV.

© 2012 Elsevier B.V. All rights reserved.

#### 1. Introduction

Presently miniaturized devices have attained more importance in medicine, sensor networks, and many other applications. Engineers aim to develop ultra compact and low energy dissipation circuits that may be used in devices like hearing aids, medical implants, and remote sensors. In these devices the minimal energy dissipation in active and standby mode, is of highest importance as the battery may last longer. This is important as it is non-trivial to change or charge a battery in a medical implant and small sensors buried or out of reach.

Currently, to get low energy dissipation, designers employ voltage scaling techniques rigorously, hence, the designed systems operate in the sub-threshold (sub- $V_T$ ) domain [1–3]. In the sub- $V_T$  domain the designers have to deal with leakage currents, which are the source of energy dissipation when a circuit is in idle mode [4]. In this regime, severely degraded on/off current ratios  $I_{\rm on}/I_{\rm off}$  and increased sensitivity to process variations are one of the main challenges for sub- $V_T$  circuit design [5] in 65 nm technologies and below. This puts an important design constraint especially in implantable medical devices. Consequently, designers need to

An ultra low-power wireless receiver with a digital baseband and design constraints of below 1 mW and 1  $\mu$ W power consumption in active and standby mode, respectively, is intended to be the target application. The device needs to handle data rates up to 250 kbits/s, and will be realized on a single chip with an area of 1 mm² in 65 nm CMOS. A block diagram shows the receiver system in Fig. 1, containing a RF front-end (2.5 GHz), an analog-to-digital converter, a digital baseband for demodulation and control, and finally, a decoder that processes the received data packets. All of theses blocks may require their own supply voltages ( $V_{\rm DD}$ ) with different level, and generation of multiple voltage levels on-chip is non trivial. Therefore, a single  $V_{\rm DD}$  is also a constraint on the system.

Many ultra low power portable devices may benefit from this kind of wireless receiver, i.e., hearing aids that communicate between the two ears to improve binaural hearing. Another example is a neural sensor inside the body that communicates with a robotic arm or leg. If a radio is made sufficiently small and with minimal energy dissipation there will be vast possibilities for new applications. Medical applications may benefit greatly from ultra low energy dissipating circuits especially in implanted, home care, surgical, and emergency monitoring [6].

The main focus of this paper is on the digital baseband of the receiver system. The first task of the digital baseband circuit is to re-sample data received from ADC at a rate of 4 Msamples/s to

*E-mail addresses*: yasser.sherazi@eit.lth.se (S.M. Yasser Sherazi), joachim. rodrigues@eit.lth.se (J.N. Rodrigues), omercan.akgun@eit.lth.se (O.C. Akgun), henrik.sjoland@eit.lth.se (H. Sjöland), peter.nilsson@eit.lth.se (P. Nilsson).

optimize digital design in terms of energy dissipation and throughput for sub- $V_{\rm T}$  operation.

An ultra low-power wireless receiver with a digital baseband

<sup>\*</sup> Corresponding author.



Fig. 1. Receiver system.

250 ksamples/s. Down sampling of signal requires anti-aliasing filters, IIR filters are chosen as they can be implemented with fewer coefficients for the required cut off frequencies. Another property of these filters is that they operate with high stability when the order of the filter is low [7]. Therefore, instead of having a high order filter, a chain of low order decimation filters is applied. To achieve lower energy dissipation, voltage scaling technique is rigorously applied, hence the designed circuits run in the sub- $V_T$  domain [8]. This current puts an important design constraint especially in implantable medical devices. Consequently, there is a need to optimize the circuits in terms of energy dissipation and throughput for sub- $V_T$  operation.

In the used 65 nm CMOS technology there are three threshold options namely, high threshold (HVT), standard threshold (SVT) and low threshold (LVT). The higher the threshold, the lower the leakage and speed of the gates. The circuits that are to be operated in sub- $V_T$  domain are designed to have minimum leakage and a short critical path. This may be obtained by the selection of HVT cells for the implementation with incremental replacements by faster cells at the critical paths to increase the throughput of the circuit. As shown in [9], unfolding the original architecture of the filter gave better performance with respect to energy per sample operation. However, all the architectures were synthesized using only HVT standard cells. Thereafter, the analysis was extended to other threshold options available for the 65 nm technology [10], where the circuits were implemented individually with standard cells of HVT, LVT and SVT. The results showed that for moderate throughput requirements the SVT cells based implementation gives the least energy dissipation.

In the study presented in this paper, the analysis has been extended to dual- $V_{\rm T}$  implementations. This is needed to analyze if any advantage may be achieved by the mix of low leakage cells (HVT) with faster cells (SVT/LVT). Moreover, the energy model proposed in [11] was updated in order to be able to simulate dual- $V_{\rm T}$  implementations.

In Section 2, a 12-bit architecture of a Half Band Digital (HBD) filter, implemented as direct mapped and various unfolded structures is presented. In Section 3, a previously published sub- $V_T$  energy model is updated for dual threshold cell implementations. In Section 4, the simulation results attained from the HBD filters are shown and discussed, and finally, the conclusions are presented in Section 5.

#### 2. Filter architectures

Minimum energy dissipation at medium to high throughput requirement puts stringent constraints on a design. This section presents the HBD filter and the architectural differences in the original and unfolded versions.

#### 2.1. Half band digital filter

The filter is chosen to be a third-order bi-reciprocal lattice wave digital filter [12]. A wave digital filter is guaranteed to be a stable alternative for analog filters. They are modular and possess a high

degree of parallelism that makes them suitable for hardware implementation [7]. A HBD filter is considered highly suitable as a decimator or interpolator, for sample rate conversions with a factor of two. Various decimation filter architectures have been reported in [13–17]. However, the benefit of using this type of filter is that all filtering may be performed at low sample rates, with low arithmetic complexity, which therefore, is a suitable candidate for both low energy dissipation, and a low chip area [18]. The transfer function of the proposed filter is

$$H_z = \frac{1 + 2z^{-1} + 2z^{-2} + z^{-3}}{2 + z^{-2}},\tag{1}$$

having the advantage that the filter coefficients are implemented by simple shift and add instructions, which in turn saves area and energy. The magnitude response of this HBD filter is shown in Fig. 2. The cut off frequency or the -3 dB frequency for this filter is half of the Nyquist frequency. In [12] the structure for this particular filter implementation was described as shown in the Fig. 3a. In Fig. 3a,  $x_{\rm n}$ , is a 12-bit input and  $y_{\rm n}$ , represents a 12-bit output of the filter.

An initial analysis indicates that the required throughputs are not achievable by the original architecture, referred to as (ORG), if operated in sub-V<sub>T</sub>, and therefore, unfolding is applied. Unfolding is a transformation technique that calculates (j) samples per clock cycle, where i is the unfolding factor. Unfolding has a property of preserving the number of delays in a Data Flow Graph [19]. The ORG filter architecture is unfolded, by factors of 2 (UF2), 4 (UF4), and 8 (UF8), see Figs. 3b and 4a and b, respectively. In Fig. 3b,  $x_{in+i}$ and  $y_{\text{in+i}}$ , are the 12-bit inputs and outputs of the filters, where j is the unfolding factor and i corresponds to the ith data sample, and  $i = 0 \rightarrow j - 1$ . In all unfolded architectures the total number of registers remain unchanged, whereas, the number of adders scale with the unfolding factor. Furthermore, the critical path in the UF2 remains unchanged compared to the ORG implementation. The critical path for UF4 and UF8 increases, since the feedback paths lack a register. However, more samples are processed per



Fig. 2. Magnitude response of the HBD filter.

#### Download English Version:

## https://daneshyari.com/en/article/460968

Download Persian Version:

https://daneshyari.com/article/460968

Daneshyari.com