کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565917 1452041 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Detection of speaker individual information using a phoneme effect suppression method
ترجمه فارسی عنوان
تشخیص اطلاعات فردی سخنران با استفاده از روش سرکوب اثر فونتی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We investigate phoneme contributions to speaker individuality using three languages.
• Voiced and unvoiced phonemes have different effects on speaker information.
• Proposed a new method to reduce the phoneme-related effects on speaker individuality.
• Our method reduced up to 67.3% speaker recognition errors for the three languages.
• The method can detect the individuality accurately with automatic phoneme alignment.

Feature extraction of speaker information from speech signals is a key procedure for exploring individual speaker characteristics and also the most critical part in a speaker recognition system, which needs to preserve individual information while attenuating linguistic information. However, it is difficult to separate individual from linguistic information in a given utterance. For this reason, we investigated a number of potential effects on speaker individual information that arise from differences in articulation due to speaker-specific morphology of the speech organs, comparing English, Chinese and Korean. We found that voiced and unvoiced phonemes have different frequency distributions in speaker information and these effects are consistent across the three languages, while the effect of nasal sounds on speaker individuality is language dependent. Because these differences are confounded with speaker individual information, feature extraction is negatively affected. Accordingly, a new feature extraction method is proposed to more accurately detect speaker individual information by suppressing phoneme-related effects, where the phoneme alignment is required once in constructing a filter bank for phoneme effect suppression, but is not necessary in processing feature extraction. The proposed method was evaluated by implementing it in GMM speaker models for speaker identification experiments. It is shown that the proposed approach outperformed both Mel Frequency Cepstrum Coefficient (MFCC) and the traditional F-ratio (FFCC). The use of the proposed feature has reduced recognition errors by 32.1–67.3% for the three languages compared with MFCC, and by 6.6–31% compared with FFCC. When combining an automatic phoneme aligner with the proposed method, the result demonstrated that the proposed method can detect speaker individuality with about the same accuracy as that based on manual phoneme alignment.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 57, February 2014, Pages 87–100
نویسندگان
, , , , ,