wu2016: On Multiplicative Integration with Recurrent Neural Networks

tags
Recurrent Neural Network, Neural Network, Machine Learning
source
http://papers.nips.cc/paper/6215-on-multiplicative-integration-with-recurrent-neural-networks

This paper explores a new architectural building block for recurrent neural networks, specifically one which integrates information from the internal state of the unit and the data stream through a multiplicative update. They use the Hadamard product to integrate this information together, although there are other choices here which they don’t discuss (such as various tensor products).

$\phi(\mathbf{W}\mathbf{x} + \mathbf{U}\mathbf{z} + \mathbf{b})$

Multiplicative building block:

$\phi(\mathbf{W}\mathbf{x} \odot \mathbf{U}\mathbf{z} + \mathbf{b})$

They have “more general versions”, which basically include a number of new bias units for a better transformation.

Claims:

• This architecture integrates information in a way that allows for difference in scale much better than additive blocks (i.e. small numbers aren’t completely overwhelmed by large numbers.)
• The backward flow of gradients through time is effected less by the exploding and vanishing gradient problem due to the inclusion of $$\mathbf{W}\mathbf{x}_k$$ in the gradient.

Do the experiments support the claims?