Article ID Journal Published Year Pages File Type
566050 Speech Communication 2012 9 Pages PDF
Abstract

A number of studies have shown that the amplitude of the first rahmonic peak (R1) in the cepstrum can be usefully employed to indicate hoarse voice quality. The cepstrum is obtained by taking the inverse Fourier transform of the log-magnitude spectrum. In the present study, a number of spectral pre-processing steps are investigated prior to computing the cepstrum; the pre-processing steps include period-synchronous, period-asynchronous, harmonic-synchronous and harmonic-asynchronous spectral band-limitation analysis. The analysis is applied on both sustained vowels [a] and connected speech signals. The correlation between R1 (the amplitude of the first rahmonic) and perceptual ratings is examined for a corpus comprising 251 speakers. It is observed that the correlation between R1 and perceptual ratings increases when the spectrum is band-limited prior to computing the cepstrum. In addition, comparisons are made with a previously reported cepstral cue, cepstral peak prominence (CPP).

► The amplitude of the first rahmonic peak obtained for connected speech and sustained vowels. ► The amplitude of the first rahmonic peak correlates with perceived hoarseness. ► Period-synchronous and harmonic-limited analyses increase correlation. ► Comparisons between the amplitude of the first rahmonic peak and cepstral peak prominence.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , , ,