Article ID Journal Published Year Pages File Type
485448 Procedia Computer Science 2016 8 Pages PDF
Abstract

Stacked-Bottle-Neck (SBN) feature extraction is a crucial part of modern automatic speech recognition (ASR) systems. The SBN network traditionally contains a hidden layer between the BN and output layers. Recently, we have observed that an SBN architecture without this hidden layer (i.e. direct BN-layer – output-layer connection) performs better for a single language but fails in scenarios where a network pre-trained in multilingual fashion is ported to a target language. In this paper, we describe two strategies allowing the direct-connection SBN network to indeed benefit from pre-training with a multilingual net: (1) pre-training multilingual net with the hidden layer which is discarded before porting to the target language and (2) using only the the direct- connection SBN with triphone targets both in multilingual pre-training and porting to the target language. The results are reported on IARPA-BABEL limited language pack (LLP) data.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,