کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
568626 1452037 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Measurement of signal-to-noise ratio in dysphonic voices by image processing of spectrograms
ترجمه فارسی عنوان
اندازه گیری نسبت سیگنال به نویز در صداهای دیسفونی با پردازش تصویر اسپکتروگرافی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Algorithm tested with vowels corrupted by perturbations and known noise levels.
• Notably accurate at noisy signals (S2NR in the 5–35 dB range).
• Nonparadoxical, i.e., results not strongly dependent on jitter and shimmer levels.
• Potentially useful for running speech.

The measurement of glottal noise was investigated in human and synthesized dysphonic voices by means of two-dimensional (2D) speech processing. A prime objective was the reduction of measurement sensitivities to fundamental frequency (f0) tracking errors and phonatory aperiodicities. An available fingerprint image enhancement algorithm was used for signal-to-noise measurement in narrow band spectrographic images. This spectrographic signal-to-noise ratio estimation method (S2NR) creates binary masks, mainly based on the orientation field of the partials, to separate energy in regions with strong harmonics from energy in noisy areas. Synthesized vowels with additive noise were used to calibrate the algorithm, validate the calibration, and systematically evaluate its dependence on f0, shimmer (cycle-to-cycle amplitude perturbation), and jitter (cycle-to-cycle f0 perturbation). In synthesized voices with known signal-to-noise ratios in the 5–40 dB range, S2NR estimates were, on average, accurate within ±3.2 dB and robust to variations in f0 (120 Hz or 220 Hz), jitter (0–3%), and shimmer (0–30%). In human /a/ produced by dysphonic speakers, S2NR values and perceptual ratings of breathiness revealed a non-linear but monotonic decay of S2NR with increased breathiness. Comparison between S2NR and related acoustic measurements indicated similar behaviors regarding the relationship with breathiness and immunity to shimmer, but the other methods had marked influence of jitter. Overall, the S2NR method did not rely on accurate f0 estimation, was robust to vocal perturbations and largely independent of vowel type, having also potential application in running speech.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volumes 61–62, June–July 2014, Pages 17–32
نویسندگان
, , ,