Article ID Journal Published Year Pages File Type
407130 Neurocomputing 2016 8 Pages PDF
Abstract

We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the various issues. We explore the suitability of two different models binary deep autencoders (bDA) and replicated-softmax deep autencoders (rsDA) for constructing deep autoencoders for text data at the sentence level. We propose and evaluate two novel metrics for better assessing the text-reconstruction capabilities of autoencoders. We propose an automatic method to find the critical bottleneck dimensionality for text representations (below which structural information is lost); and finally we conduct a comparative evaluation across different languages, exploring the regions of critical bottleneck dimensionality and its relationship to language perplexity.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,