کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
568487 1452020 2016 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework
ترجمه فارسی عنوان
تبدیل داده محور صوتی حروف الفباء به واج در چارچوب مدل سازی واژگانی احتمالاتی
کلمات کلیدی
تبدیل حروف الفباء به واج ؛ چارچوب مدل سازی واژگانی احتمالاتی ؛ مدل مخفی مارکوف مبتنی بر واگرایی کولبک Leibler ؛ تشخیص گفتار خودکار؛ توسعه فرهنگ نامه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• A novel HMM-based grapheme-to-phoneme conversion (G2P) formalism is presented.
• Local classification G2P methods are shown to be particular case of this formalism.
• Based on the formalism, an acoustic G2P approach (aG2P) is proposed.
• aG2P yields comparable performance to state-of-the-art G2P methods at ASR level.
• aG2P is potentially useful for lexicon development for low-resourced languages.

One of the primary steps in building automatic speech recognition (ASR) and text-to-speech systems is the development of a phonemic lexicon that provides a mapping between each word and its pronunciation as a sequence of phonemes. Phoneme lexicons can be developed by humans through use of linguistic knowledge, however, this would be a costly and time-consuming task. To facilitate this process, grapheme-to-phoneme conversion (G2P) techniques are used in which, given an initial phoneme lexicon, the relationship between graphemes and phonemes is learned through data-driven methods. This article presents a novel G2P formalism which learns the grapheme-to-phoneme relationship through acoustic data and potentially relaxes the need for an initial phonemic lexicon in the target language. The formalism involves a training part followed by an inference part. In the training part, the grapheme-to-phoneme relationship is captured in a probabilistic lexical modeling framework. In this framework, a hidden Markov model (HMM) is trained in which each HMM state representing a grapheme is parameterized by a categorical distribution of phonemes. Then in the inference part, given the orthographic transcription of the word and the learned HMM, the most probable sequence of phonemes is inferred. In this article, we show that the recently proposed acoustic G2P approach in the Kullback–Leibler divergence-based HMM (KL-HMM) framework is a particular case of this formalism. We then benchmark the approach against two popular G2P approaches, namely joint multigram approach and decision tree-based approach. Our experimental studies on English and French show that despite relatively poor performance at the pronunciation level, the performance of the proposed approach is not significantly different than the state-of-the-art G2P methods at the ASR level.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 80, June 2016, Pages 1–21
نویسندگان
, , ,