LSTM

tags
Recurrent Neural Network Neural Network, Machine Learning
source
https://colah.github.io/posts/2015-08-Understanding-LSTMs/

The Long-Short Term Memory recurrent unit was first developed in (Hochreiter 1997), who showed the architecture could extend the effective horizon of the predictions made by recurrent networks. The main idea of the architecture was to have a linear path back through time to deal with the vanishing gradient problem. This is exactly what they introduced, where they have a memory component which gets modified through various gates in the architecture, but whose temporal connection is always linear. Meaning there are only a few non-linear operations between the current hiddenstate and a previous observation.

While the architecture was developed in (Hochreiter 1997), it became popular as it was later refined into what is most commonly used today using Backpropagation Through Time as the base training algorithm.

The architecture is made up of several gates: 2 Input Gates, a Forget Gate, and an Output Gate. The architecture has been modified considerably since, and several variants exist including the GRU.

(Hochreiter 1997) Sepp Hochreiter and J Urgen Schmidhuber, {{LONG SHORT}}-{{TERM MEMORY}}, Neural Computation, , pp. (1997). .

Recurrent Neural Network

chandar2019: Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies

chung2014: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

This paper does an empirical evaluation of several recurrent gates including LSTMs hochreiter1997, GRU cho2014, and Vanilla RNNs. The paper also provides descriptions for the different cells tested and a nice high level description of the generative model employed by RNNs.

This paper does an empirical evaluation of several recurrent gates including LSTMs hochreiter1997, GRU cho2014, and Vanilla RNNs. The paper also provides descriptions for the different cells tested and a nice high level description of the generative model employed by RNNs.

sutskever2011: Generating text with recurrent neural networks

The main contribution of this paper is the application of RNNs on a hard language tasks, thus showing their potential for language and other sequence tasks. Instead of using the usual Vanilla RNN, or an LSTM they introduce the idea of multiplicative RNNs and tensor RNNs. They find these significantly improve performance on the tasks. They mention that the multiplicative RNNs have some optimization issues which are mediated through the use of second-order optimization techniques.