Multi-style learning with denoising autoencoders for acoustic modeling in the internet of things (IoT)

Article ID	Journal	Published Year	Pages	File Type
4973727	Computer Speech & Language	2017	23 Pages	PDF

Abstract

We propose a multi-style learning (multi-style trainingÂ +Â deep learning) procedure that relies on deep denoising autoencoders (DAEs) to extract and organize the most discriminative information in a training database. Traditionally, multi-style training procedures require either collecting or artificially creating data samples (e.g., by noise injection or data combination) and training a deep neural network (DNN) with all of these different conditions. To expand the applicability of deep learning, the present study instead adopts a DAE to augment the original training set. First, a DAE is utilized to synthesize data that captures useful structure in the input distribution. Next, this synthetic data is combined and mixed within the original training set to exploit the powerful capabilities of DNN classifiers to learn the complex decision boundaries in heterogeneous conditions. By assigning a DAE to synthesize additional examples of representative variations, multi-style learning makes class boundaries less sensitive to corruptions by enforcing back-end DNNs to emphasize on the most discriminative patterns. Moreover, this deep learning technique mitigates the cost and time of data collection and is easy to incorporate into the internet of things (IoT). Results showed these data-mixed DNNs provided consistent performance improvements without even requiring any preprocessing on the test sets.

Keywords

Internet of Things (IoT)Data combination Automatic speech recognition Feature compensation Data synthesis Deep neural networks Deep learning Representation learning