Article ID Journal Published Year Pages File Type
485440 Procedia Computer Science 2016 8 Pages PDF
Abstract

We investigate the application of two novel lattice-constrained Viterbi training strategies to the task of improving sub-word unit (SWU) inventories that were discovered using an unsupervised sparse coding approach. The automatic determination of these SWUs remain a critical and unresolved obstacle to the development of ASR for under-resourced languages. The first lattice-constrained training strategy attempts to jointly learn a bigram SWU language model along with the evolving SWU inventory. We find that this substantially increases correspondence with expert-defined reference phonemes on the TIMIT dataset, but does little to improve pronunciation consistency. The second approach attempts to jointly infer an SWU pronunciation model for each word in the training vocabulary, and to constrain transcription using these models. We find that this lightly supervised approach again substantially increases correspondence with the reference phonemes, and in this case also improves pronunciation consistency.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,