goudreau1994firstorder: First-order versus second-order single-layer recurrent neural networks
- tags
- Recurrent Neural Network, Machine Learning
- source
- paper
- authors
- Goudreau, M., Giles, C., Chakradhar, S., & Chen, D.
- year
- 1994
This paper goes through the difference between first and second-order recurrent neural networks.
Notation:
- \(\mathbf{x}^t \in \mathbb{R}^N\) is the input vector at time \(t\)
- \(\mathbf{h}^{t-1} \in \mathbb{R}^M\) is the hidden state at time \(t-1\)
- \(\mathbf{W}\) is the RNN weight matrix
- \(\mathbf{z}^{t} = [\mathbf{x}^t; \mathbf{h}^{t-1}]\)
First order
\[ \mathbf{h}^t_{i} = g\left(\sum_{j=1}^{M+N} \mathbf{W}_{ij} z^{t}_{j} \right) \]
Second order
\[ \mathbf{h}^t_{i} = g\left(\sum_{j=1}^{M} \sum{k=1}^{N} \mathbf{W}_{ijk} \mathbf{x}^{t}_{j} \mathbf{h}^{t-1}_{k} \right) \]
The second order RNN looks very similar to a tensor contraction across \(\mathbf{x}\) and \(\mathbf{y}\). This should be thought through as we combine different types of information in the state update. In the paper the non-linear function is the hard-limiter function
\[ g(x) = \begin{cases} 0 & \quad \text{if } x \leq 0\\ 1 & \quad \text{if } x > 0 \end{cases} \]
The rest of the paper is dedicated to showing that the second order RNN can implement any finite state recognizer (theorem 2), while a first-order RNN cannot. There is a modification to the first-order RNN which can implement the parity problem (including several layers after the RNN).