# goudreau1994: First-order versus second-order single-layer recurrent neural networks

- tags
- Recurrent Neural Network, Machine Learning
- source
- paper
- authors
- Goudreau, M., Giles, C., Chakradhar, S., & Chen, D.
- year
- 1994

This paper goes through the difference between first and second-order recurrent neural networks.

Notation:

- \(\mathbf{x}^t \in \mathbb{R}^N\) is the input vector at time \(t\)
- \(\mathbf{h}^{t-1} \in \mathbb{R}^M\) is the hidden state at time \(t-1\)
- \(\mathbf{W}\) is the RNN weight matrix
- \(\mathbf{z}^{t} = [\mathbf{x}^t; \mathbf{h}^{t-1}]\)

**First order**

\[ \mathbf{h}^t_{i} = g\left(\sum_{j=1}^{M+N} \mathbf{W}_{ij} z^{t}_{j} \right) \]

**Second order**

\[ \mathbf{h}^t_{i} = g\left(\sum_{j=1}^{M} \sum{k=1}^{N} \mathbf{W}_{ijk} \mathbf{x}^{t}_{j} \mathbf{h}^{t-1}_{k} \right) \]

The second order RNN looks very similar to a tensor contraction across \(\mathbf{x}\) and \(\mathbf{y}\). This should be thought through as we combine different types of information in the state update. In the paper the non-linear function is the hard-limiter function

\[
g(x) = \begin{cases}
0 & \quad \text{if } x \leq 0\\\

1 & \quad \text{if } x > 0
\end{cases}
\]

The rest of the paper is dedicated to showing that the second order RNN can implement any finite state recognizer (theorem 2), while a first-order RNN cannot. There is a modification to the first-order RNN which can implement the parity problem (including several layers after the RNN).