The Wavelet and Fourier Transforms in Feature Extraction for Text-Dependent, Filterbank-Based Speaker Recognition

Article ID	Journal	Published Year	Pages	File Type
489067	Procedia Computer Science	2011	6 Pages	PDF

Abstract

An important step in speaker recognition is extracting features from raw speech that captures the unique characteristics of each speaker. The most widely used method of obtaining these features is the filterbank-based Mel Frequency Cepstral Coefficients (MFCC) approach. Typically, an important step in the process is the employment of the discrete Fourier transform (DFT) to compute the spectrum of the speech waveform. However, over the past few years, the discrete wavelet transform (DWT) has gained remarkable attention, and has been favored over the DFT in a wide variety of applications. This work compares the performance of the DFT with the DWT in the computation of MFCC in the feature extraction process for speaker recognition. It is shown that the DWT results in significantly lower order for the Gaussian Mixture Model (GMM) used to model speech and marginal improvement in accuracy.