Novel architecture for long short-term memory used in question classification

Article ID	Journal	Published Year	Pages	File Type
6863897	Neurocomputing	2018	36 Pages	PDF

Abstract

Long short-term memory (LSTM) is an extension of the recurrent neural network (RNN) and has achieved excellent performance in various tasks, especially sequential problems. The LSTM is chain-structured, and its architecture is limited by sequential information propagation. In practice, it is hard to solve the problems involving very long term dependencies. Recent studies have indicated that adding recurrent skip connections across multiple timescales may help the RNN improve its performance in long-term dependencies. Moreover, capturing local features can improve the performance of the RNN. In this paper, we propose a novel architecture (Att-LSTM) for the LSTM, which connects continuous hidden states of previous time steps to the current time step and applies an attention mechanism to these hidden states. This architecture can not only capture local features effectively but also help learn very long-distance correlations in an input sequence. We evaluate Att-LSTM in various sequential tasks, such as adding problem, sequence classification, and character-level language modeling. In addition, to prove the generalization and practicality of the novel architecture, we design a character-level hierarchical Att-LSTM and refine the word representation with a highway network. This hierarchical model achieved excellent performance on question classification.

Keywords

LSTM Highway network Hierarchical model Attention Mechanism