کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
568998 876514 2006 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Language identification using acoustic log-likelihoods of syllable-like units
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Language identification using acoustic log-likelihoods of syllable-like units
چکیده انگلیسی

Automatic spoken language identification (LID) is the task of identifying the language from a short utterance of the speech signal uttered by an unknown speaker. The most successful approach to LID uses phone recognizers of several languages in parallel [Zissman, M.A., 1996. Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4 (1), 31–44]. The basic requirement to build a parallel phone recognition (PPR) system is segmented and labeled speech corpora. In this paper, a novel approach is proposed for the LID task which uses parallel syllable-like unit recognizers, in a frame work similar to the PPR approach in the literature. The difference is that the sub-word unit models for each of the languages to be recognized are generated in an unsupervised manner without the use of segmented and labeled speech corpora. The training data of each of the languages is first segmented into syllable-like units and language-dependent syllable-like unit inventory is created. These syllable-like units are then clustered using an incremental approach. This results in a set of syllable-like units models for each language. Using these language-dependent syllable-like unit models, language identification is performed based on accumulated acoustic log-likelihoods. Our initial results on the Oregon Graduate Institute Multi-language Telephone Speech Corpus [Muthusamy, Y.K., Cole, R.A., Oshika, B.T., 1992. The OGI multi-language telephone speech corpus. In: Proceedings of Internat. Conf. Spoken Language Process., October 1992, pp. 895–898] show that the performance is 72.3%. We further show that if only a subset of syllable-like unit models that are unique (in some sense) are considered, the performance improves to 75.9%.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 48, Issue 8, August 2006, Pages 913–926
نویسندگان
, ,