Recurrent Neural Network

Neural Network

See other extensions: LSTM, (Wu 2016), (Chandar 2019), (Goudreau 1994), (Sutskever 2011), (Cho 2014)

(Wu 2016) Yuhuai Wu; Saizheng Zhang; Ying Zhang; Yoshua Bengio and Russ R Salakhutdinov, On {{Multiplicative Integration}} with {{Recurrent Neural Networks}}, (2016).

(Chandar 2019) Sarath Chandar; Chinnadhurai Sankar; Eugene Vorontsov; Samira Ebrahimi Kahou and Yoshua Bengio, Towards {{Non}}-Saturating {{Recurrent Units}} for {{Modelling Long}}-Term {{Dependencies}}, {{AAAI}}(2019).

(Goudreau 1994) M.W. Goudreau; C.L. Giles; S.T. Chakradhar and D. Chen, First-Order versus Second-Order Single-Layer Recurrent Neural Networks, IEEE Transactions on Neural Networks, , pp. (1994). .

(Sutskever 2011) Ilya Sutskever; James Martens and Geoffrey Hinton, Generating Text with Recurrent Neural Networks, Proceedings of the 28th {{International Conference}} on {{International Conference}} on {{Machine Learning}}(2011).

(Cho 2014) Kyunghyun Cho; Bart {van Merri{\“e}nboer}; Dzmitry Bahdanau and Yoshua Bengio, On the {{Properties}} of {{Neural Machine Translation}}: {{Encoder}}\textendash{{Decoder Approaches}}, Proceedings of {{SSST}}-8, {{Eighth Workshop}} on {{Syntax}}, {{Semantics}} and {{Structure}} in {{Statistical Translation}}(2014).

Backpropagation Through Time

General Value Functions


chandar2019: Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies

chung2014: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

This paper does an empirical evaluation of several recurrent gates including LSTMs hochreiter1997, GRU cho2014, and Vanilla RNNs. The paper also provides descriptions for the different cells tested and a nice high level description of the generative model employed by RNNs.

Recurrent Neural Network, Machine Learning

goudreau1994: First-order versus second-order single-layer recurrent neural networks

Recurrent Neural Network, Machine Learning

sutskever2011: Generating text with recurrent neural networks

The main contribution of this paper is the application of RNNs on a hard language tasks, thus showing their potential for language and other sequence tasks. Instead of using the usual Vanilla RNN, or an LSTM they introduce the idea of multiplicative RNNs and tensor RNNs. They find these significantly improve performance on the tasks. They mention that the multiplicative RNNs have some optimization issues which are mediated through the use of second-order optimization techniques.

Recurrent Neural Network, Machine Learning

wu2016: On Multiplicative Integration with Recurrent Neural Networks

First they look at the gradients when the RNNs have linear activation mappings (to focus on the internal mechanisms). They measure the log of the L2-norm of the gradient for each epoch (averaged over the training set) using the Penn-Treebank dataset using the ADAM optimizer. They are able to show the norm of the gradient grows much more in vanilla architecture (using additive operations) vs what occurs in the new architecture.

Recurrent Neural Network, Neural Network, Machine Learning