# goudreau1994: First-order versus second-order single-layer recurrent neural networks

tags
Recurrent Neural Network, Machine Learning
source
paper
authors
Goudreau, M., Giles, C., Chakradhar, S., & Chen, D.
year
1994

This paper goes through the difference between first and second-order recurrent neural networks.

Notation:

• $$\mathbf{x}^t \in \mathbb{R}^N$$ is the input vector at time $$t$$
• $$\mathbf{h}^{t-1} \in \mathbb{R}^M$$ is the hidden state at time $$t-1$$
• $$\mathbf{W}$$ is the RNN weight matrix
• $$\mathbf{z}^{t} = [\mathbf{x}^t; \mathbf{h}^{t-1}]$$

First order

$\mathbf{h}^t_{i} = g\left(\sum_{j=1}^{M+N} \mathbf{W}_{ij} z^{t}_{j} \right)$

Second order

$\mathbf{h}^t_{i} = g\left(\sum_{j=1}^{M} \sum{k=1}^{N} \mathbf{W}_{ijk} \mathbf{x}^{t}_{j} \mathbf{h}^{t-1}_{k} \right)$

The second order RNN looks very similar to a tensor contraction across $$\mathbf{x}$$ and $$\mathbf{y}$$. This should be thought through as we combine different types of information in the state update. In the paper the non-linear function is the hard-limiter function

$g(x) = \begin{cases} 0 & \quad \text{if } x \leq 0\\\ 1 & \quad \text{if } x > 0 \end{cases}$

The rest of the paper is dedicated to showing that the second order RNN can implement any finite state recognizer (theorem 2), while a first-order RNN cannot. There is a modification to the first-order RNN which can implement the parity problem (including several layers after the RNN).