LSTM

Feb 21, 2025

tags: Recurrent Neural Network Neural Network, Machine Learning
source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

The Long-Short Term Memory recurrent unit was first developed in (Hochreiter and Schmidhuber 1997), who showed the architecture could extend the effective horizon of the predictions made by recurrent networks. The main idea of the architecture was to have a linear path back through time to deal with the vanishing gradient problem. This is exactly what they introduced, where they have a memory component which gets modified through various gates in the architecture, but whose temporal connection is always linear. Meaning there are only a few non-linear operations between the current hiddenstate and a previous observation.

While the architecture was developed in (Hochreiter and Schmidhuber 1997), it became popular as it was later refined into what is most commonly used today using Backpropagation Through Time as the base training algorithm.

The architecture is made up of several gates: 2 Input Gates, a Forget Gate, and an Output Gate. The architecture has been modified considerably since, and several variants exist including the GRU.

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.

Tags:

Recurrent Representation

Matthew Schlegel

LSTM

Tags:

Links to this note: