Implicit processing of LP residual for language identification

Article ID	Journal	Published Year	Pages	File Type
6951517	Computer Speech & Language	2017	20 Pages	PDF

Abstract

Present work explores the excitation source information for the language identification (LID) task. In this work, excitation source information is captured by implicit processing of linear prediction (LP) residual signal for discriminating the languages. Raw samples of LP residual signal, its magnitude, and phase components are processed independently at sub-segmental, segmental and suprasegmental levels for extracting the language-specific excitation source information. The LID studies are carried out using 27 Indian languages from Indian Institute of Technology Kharagpur-Multi Lingual Indian Language Speech Corpus (IITKGP-MLILSC) and 11 international languages from OGI-MLTS corpus. The Gaussian mixture models (GMMs) are used in this work to model the language-specific excitation source information for LID task. From the experimental results, it can be observed that, features extracted from segmental level yields better identification accuracy (50.92%), compared to sub-segmental (47.77%) and suprasegmental levels (43.88%). Further, the evidence from all three levels is combined to obtain the complete excitation source information. Finally, we have investigated the existence of non-overlapping language-specific information present in excitation source and vocal tract features.

Keywords

Subsegmental segmental