کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566077 875927 2011 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Listeners’ weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Listeners’ weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis
چکیده انگلیسی

The quality of current commercial speech synthesis systems is now so high that system improvements are being made at subtle sub- and supra-segmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by default when asked to evaluate synthetic speech. It may, therefore, be potentially quite difficult to design an evaluation method that allows listeners to concentrate on only one dimension of the signal, while ignoring others that are perceptually more important to them.The aim of the current study was to determine which acoustic characteristics of unit-selection synthetic speech are most salient to listeners when evaluating the naturalness of such speech. This study made use of multidimensional scaling techniques to analyse listeners’ pairwise comparisons of synthetic speech sentences. Results indicate that listeners place a great deal of perceptual importance on the presence of artifacts and discontinuities in the speech, somewhat less importance on aspects of segmental quality, and very little importance on stress/intonation appropriateness. These relative differences in importance will impact on listeners’ ability to attend to these different acoustic characteristics of synthetic speech, and should therefore be taken into account when designing appropriate methods of synthetic speech evaluation.

Research highlights
► When evaluating the naturalness of unit-selection synthetic speech, listeners place different degrees of perceptual importance on different characteristics of the acoustic speech stream.
► Listeners are highly sensitive to many aspects of discontinuities at joins between units, and joins play a large role in listeners’ judgements of unit-selection synthetic speech naturalness.
► Measures of unit appropriateness play a smaller role than those of join appropriateness in listeners’ judgements of unit-selection synthetic speech naturalness; listeners’ naturalness judgements are affected by very specific and context-dependent aspects of unit acceptability.
► Stress/intonation appropriateness plays only a small role in listeners’ judgements of unit-selection synthetic speech naturalness.
► Interactions can be observed between different characteristics of unit-selection synthetic speech: listeners appear to be better able to judge unit appropriateness when only high quality joins are present.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 53, Issue 3, March 2011, Pages 311–326
نویسندگان
, , ,