FISEVIER

Contents lists available at ScienceDirect

## INTEGRATION, the VLSI journal

journal homepage: www.elsevier.com/locate/vlsi



# An energy-efficient, high-precision SFP LPFIR filter engine for digital hearing aids



Shih-Hao Ou, Kuo-Chiang Chang, Chih-Wei Liu\*

Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan

#### ARTICLE INFO

Article history:
Received 19 October 2012
Received in revised form
17 June 2014
Accepted 17 June 2014
Available online 28 June 2014

Reywords:
Static floating-point arithmetic
Cascaded datapath
Linear-phase FIR filter
Hearing aid
ANSI 51.11 1/3-octave filter bank

#### ABSTRACT

The main contribution of this study is the development of an area-/energy-efficient cascaded directtruncation (DT) datapath with the so-called static floating-point (SFP) arithmetic to realize a low-delay analysis filter bank (AFB) for digital hearing aids. In the proposed SFP LPFIR (linear-phase finite impulse response) filter engine, lower silicon area and lesser power consumption facilitate better SNR performance than that achieved with the conventional post-truncation (PT) datapath with integer arithmetic. Moreover, in the proposed LPFIR filter engine, a cascaded 16-bit SFP A-M-S-Acc datapath is used that consists of two embedded 1-bit shifters to improve hardware usage and parallelism, one 16-bit DT adder (A), one 16-bit DT multiplier (M), one 16-bit barrel shifter (S), and one 16-bit DT accumulator (Acc). The operations per cycle (OPC) of the proposed SFP LPFIR filter engine reaches 6, which enables efficient fabrication of the low-latency AFB for hearing aids. To verify the effectiveness of the proposed 16-bit SFP LPFIR filter engine, a 10-ms 18-band quasi-ANSI S1.11 1/3-octave AFB for digital hearing aids was implemented using UMC 90-nm CMOS technology. The AFB was operated at 792 kHz to process, in real-time,  $24\,\text{kHz}$  audio, with the power consumption being approximately  $80.6\,\mu\text{W}$  (at  $1\,\text{V}$ ). Compared to the previous design in which the conventional PT datapath with integer arithmetic was used, approximately 9.6% of total power and 8.3% of silicon area were saved and almost the same SNR (signal-to-noise ratio) performance was achieved with the new system, when evaluated by a 3.96-s sequence of Mandarin speech.

© 2014 Elsevier B.V. All rights reserved.

#### 1. Introduction

Fig. 1 illustrates the abstract architectural datapath of three distinct types of processor: the scalar processor, the multi-issue processor, and the application-specific instruction set processor (ASIP<sup>1</sup>) or the customized DSP; in the figure, the functional units (FUs) labeled as "A," "M," "S," and "Acc" for the adder, multiplier, shifter, and accumulator, respectively [1,2]. For the scalar processor with A-M-S FUs in Fig. 1(a), three read/write ports (2R-1W) are allocated to access the centralized register file (RF). The OPC (operations per cycle) of the scalar reaches one because one and maximally one FU is activated (or issued) in each instruction cycle. The datapath of the scalar suffers from low hardware usage, which is equal to or less than 1/N if N FUs are available. Thus, frequency must be increased to improve performance. Conversely, the datapath of the N-issue VLIW or superscalar integrates N independent FUs, in which all FUs can be issued concurrently. With the parallel A-M-Acc datapath shown in Fig. 1(b), the maximal OPC of

<sup>1</sup> The full list of all acronyms is given at the end of the article.

the multi-issue processor is three. By exploiting instruction-level parallelism, the multi-issue processor is designed to improve throughput instead of increasing frequency. Nevertheless, the access ports to the centralized RF increase when more FUs are allocated. In the illustrated example shown in Fig. 1(b), eight ports (i.e., 5R-3W) are required for the 3-issue processor to access the centralized RF. As described in [3], the area, latency, and power of the multi-issue processor with centralized RF increase as  $P^3$ ,  $P^{3/2}$ , and  $P^3$ , respectively, for P access ports. An alternative datapath with high OPC is demonstrated in Fig. 1(c). Based on the regular and repetitive characteristics of DSP algorithms, e.g., DCT and FFT, the datapath of an ASIP (or DSP) can be customized [2,4] to perform the composite operations that occur frequently in the algorithm. The cascaded A-M-Acc datapath shown in Fig. 1(c), for example, is suitable for fabricating the linear-phase finite-impulse response (LPFIR) filter [4,5]. The maximum OPC of the cascaded datapath shown in Fig. 1(c) remains three, whereas the number of allocated ports to the centralized RF is only four.

In direct contrast to the parallel datapath, the cascaded datapath maintains comparable OPC with limited access ports to the centralized RF. Consequently, the area, latency, and power of the ASIP do not increase substantially when additional FUs are allocated. However, the design challenge of the cascaded datapath

<sup>\*</sup> Corresponding author. Tel.: +886 3 573 1685; fax: +886 3 571 0580. E-mail address: cwliu@twins.ee.nctu.edu.tw (C.-W. Liu).



Fig. 1. Abstract datapath of (a) scalar, (b) parallel multi-issue processor, and (c) customized ASIP or DSP, respectively.

is the high-level synthesis of a given DSP algorithm [4], because a datapath with an inadequate cascading order will lead to low hardware usage. Moreover, the cascaded datapath results in a long critical path [4] and, therefore, the direct-truncation (DT) FUs are used in certain low-power DSP applications [2]. Compared to post-truncation (PT) or full-precision (FP) FUs, the cascaded DT datapath is fast and inexpensive, but the DT datapath introduces the loss of high precision, which might severely degrade performance.

If the clock rate of an applied DSP is not a strict constraint, as in digital hearing aids [6-8], then FP or PT FUs are usually used in commercial hearing-aid DSPs for high precision [9]. The DSP in a digital hearing aid is generally less powerful than those in consumer applications [6,8]. For example, the computing power of the fastest DSP for a hearing aid [9-13] is hundreds of times slower than that in Microsoft Xbox 360. However, the processor in Xbox 360 draws almost 20 W, whereas the DSP in a digital hearing aid consumes < 1 mW [8]. CoolFlux DSP [10] allocates a 24-bit FP datapath with integer arithmetic for low-power audio applications; the datapath consists of three FP adders, two FP multipliers, two truncation units, and one long bit-width (40-bit) accumulator. The complicated datapath supports single-cycle 24-bit dual-MAC (multiply-and-accumulate) operation for high-precision audio processing. Diverse speech-/audio-centric applications are targeted by On Semiconductor R3910 DSP [11], a pre-configurable hearing-aid computing platform in which a 20-bit FP integer datapath is used. Moreover, On Semiconductor provides BelaSigna 200 [12] and BelaSigna 300 [13], both of which support singlecycle high-precision MAC operation. To achieve distinct levels of precision, a 16-bit FP integer datapath is allocated in BelaSigna 200, whereas a 24-bit datapath is exploited in BelaSigna 300. Compared to the DT datapath with integer arithmetic, the FP datapath has the drawback of a large silicon area and extra energy (or power) dissipation [5].

The dual requirements of small area and low-power consumption in high-precision DSPs used for hearing aids complicate the design considerably. To fabricate low-latency LPFIR analysis filter bank (AFB) for digital hearing aids [26], this study developed a high-precision and energy-/area-efficient cascaded DT datapath, which applies the so-called static floating-point (SFP) arithmetic [14]. The SFP arithmetic enables the optimal compromise between floating-point and integer arithmetic to be identified. With SFP arithmetic, the signal is matched readily to the full-scale range of the fixed-width quantizer and thereby optimal SQNR (signal-to-quantization-noise ratio) performance is achieved. When evaluated using the 39-th, 30-th, and 27-th ANSI S1.11 1/3-octave filters  $F_{39}$ ,  $F_{30}$ , and  $F_{27}$  for digital hearing

aids [25,26] that are 27-tap, 67-tap, and 97-tap LPFIR filters, respectively, the fixed 16-bit SFP arithmetic achieves an additional 18.1, 23.7, and 30.1 dB SQNR gain compared with the 16-bit integer arithmetic. The cost of applying the SFP arithmetic is a large increase in the shift operations required for aligning the operand and normalizing the results for every data manipulation. However, these shift operations can be decided statically and scheduled appropriately during design time, instead of being considered dynamically [14]. Moreover, most of the inserted shift operations in the SFP arithmetic are only 1-bit-shifting operations.

The proposed area-/energy-efficient and high-precision SFP LPFIR filter engine contains two embedded 1-bit shifters, one 16-bit DT adder (A), one 16-bit DT multiplier (M), one 16-bit barrel shifter (S), and one 16-bit DT accumulator (Acc). The two internal 1-bit shifters improve both hardware usage and parallelism of the proposed DT SFP A-M-S-Acc datapath. When examined using the 10-ms, 18-band quasi-ANSI S1.11 1/3-octave AFB for digital hearing aids [26], the hardware usage of the proposed SFP LPFIR filter engine achieved between 0.83 and 0.9. For equal comparison, this study reimplemented the 10-ms, 18-band quasi-ANSI S1.11 1/3-octave LPFIR AFB, designed in [26], using UMC 90-nm CMOS high-VT technology. The synthesis constraint was set by the "minimum area" and the wire-load model "wl10" was applied. With the parallel PT integer datapath to support 8-MAC operation in one cycle [26], the implemented AFB has an area of 41,533 (2-input NAND) gates and was operated at 792 kHz for real-time processing of 24 kHz audio. The estimated power consumption of the AFB test chip was approximately 87.9 µW (at 1 V) when evaluated using a 3.96-s sequence of Mandarin speech. By contrast, when implemented using the proposed 8-parallel, 16-bit DT SFP A-M-S-Acc datapath instead, the silicon area was approximately 37,542 gates, an approximately 8.3% reduction in silicon area. When examined using the same speech sequence, it consumed 80.6 µW, an approximately 9.6% saving in power while achieving the same SNR (signal-to-noise ratio) performance. Simulation results verified the effectiveness of the proposed low-cost, high-precision SFP LPFIR filter engine for fabricating a lowdelay quasi-ANSI AFB for advanced hearing aids.

The rest of this paper is organized as follows. Section 2 introduces the proposed low-delay LPFIR AFB for digital hearing aids and Section 3 describes the SFP arithmetic and its applications. The details of the design concept, architecture, and methodology of the proposed 16-bit DT SFP A–M–S–Acc datapath are described in Section 4, which also presents simulation and comparison results that verify the success of the proposed architecture. Finally, Section 5 presents the conclusions.

### Download English Version:

# https://daneshyari.com/en/article/539644

Download Persian Version:

https://daneshyari.com/article/539644

<u>Daneshyari.com</u>