Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
566050 | Speech Communication | 2012 | 9 Pages |
A number of studies have shown that the amplitude of the first rahmonic peak (R1) in the cepstrum can be usefully employed to indicate hoarse voice quality. The cepstrum is obtained by taking the inverse Fourier transform of the log-magnitude spectrum. In the present study, a number of spectral pre-processing steps are investigated prior to computing the cepstrum; the pre-processing steps include period-synchronous, period-asynchronous, harmonic-synchronous and harmonic-asynchronous spectral band-limitation analysis. The analysis is applied on both sustained vowels [a] and connected speech signals. The correlation between R1 (the amplitude of the first rahmonic) and perceptual ratings is examined for a corpus comprising 251 speakers. It is observed that the correlation between R1 and perceptual ratings increases when the spectrum is band-limited prior to computing the cepstrum. In addition, comparisons are made with a previously reported cepstral cue, cepstral peak prominence (CPP).
► The amplitude of the first rahmonic peak obtained for connected speech and sustained vowels. ► The amplitude of the first rahmonic peak correlates with perceived hoarseness. ► Period-synchronous and harmonic-limited analyses increase correlation. ► Comparisons between the amplitude of the first rahmonic peak and cepstral peak prominence.