Cepstral Peak Sensitivity: A Theoretic Analysis and Comparison of Several Implementations

Article ID	Journal	Published Year	Pages	File Type
1101273	Journal of Voice	2015	12 Pages	PDF

Abstract

SummaryObjectiveThe aim of this study was to develop a theoretic analysis of the cepstral peak (CP), to compare several CP software programs, and to propose methods for reducing variability in CP estimation.Study DesignDescriptive, experimental study.MethodsThe theoretic CP value of a pulse train was derived and compared with estimates computed for pulse train WAV files using available CP software programs: (1) Hillenbrand's CP prominence (CPP) software (Western Michigan University, Kalamazoo, MI), (2) KayPENTAX (Montvale, NJ) Multi-Speech implementation of CPP, and (3) a MATLAB (The Mathworks, Natick, MA, version R2014a) implementation using cepstral interpolation. The CP variation was also investigated for synthetic breathy vowels.ResultsFor pulse trains with period T samples, the theoretic CP is 1/2 + ε/T, |ε| < 0.1 for all pulse trains (ε = 0 for integer T). For fundamental frequencies between 70 and 230 Hz, the CP mean ± standard deviation was 0.496 ± 0.002 using cepstral interpolation and 0.29 ± 0.03 using Hillenbrand's software, whereas CPP was 35.0 ± 3.8 dB using Hillenbrand's software and 20.5 ± 2.7 dB using KayPENTAX's software. The CP and CPP versus signal-to-noise ratio for synthetic breathy vowels were fit to a logistic model for the Hillenbrand (R2 = 0.92) and KayPENTAX (R2 = 0.82) estimators as well as an ideal estimator (R2 = 0.98), which used a period-synchronous analysis.ConclusionsThe findings indicate that several variables unrelated to the signal itself impact CP values, with some factors introducing large variability in CP values that would otherwise be attributed to the signal (eg, voice quality). Variability may be reduced by using a period-synchronous analysis with Hann windows.

Keywords

CPP