LSTM

tags
Recurrent Neural Network Neural Network, Machine Learning
source
https://colah.github.io/posts/2015-08-Understanding-LSTMs/

The Long-Short Term Memory recurrent unit was first developed in (Hochreiter and Urgen Schmidhuber 1997), who showed the architecture could extend the effective horizon of the predictions made by recurrent networks. The main idea of the architecture was to have a linear path back through time to deal with the vanishing gradient problem. This is exactly what they introduced, where they have a memory component which gets modified through various gates in the architecture, but whose temporal connection is always linear. Meaning there are only a few non-linear operations between the current hiddenstate and a previous observation.

While the architecture was developed in (Hochreiter and Urgen Schmidhuber 1997), it became popular as it was later refined into what is most commonly used today using Backpropagation Through Time as the base training algorithm.

The architecture is made up of several gates: 2 Input Gates, a Forget Gate, and an Output Gate. The architecture has been modified considerably since, and several variants exist including the GRU.

Hochreiter, Sepp, and J Urgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation.