Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
407130 | Neurocomputing | 2016 | 8 Pages |
We present a comprehensive study on the use of autoencoders for modelling text data, in which (differently from previous studies) we focus our attention on the various issues. We explore the suitability of two different models binary deep autencoders (bDA) and replicated-softmax deep autencoders (rsDA) for constructing deep autoencoders for text data at the sentence level. We propose and evaluate two novel metrics for better assessing the text-reconstruction capabilities of autoencoders. We propose an automatic method to find the critical bottleneck dimensionality for text representations (below which structural information is lost); and finally we conduct a comparative evaluation across different languages, exploring the regions of critical bottleneck dimensionality and its relationship to language perplexity.