Temporal Difference Learning

tags
Reinforcement Learning

This is a method for learning Value Functions and was first described by (Sutton 1988).

References

(Sutton 1988) Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences, Machine Learning, , pp. (1988). .

sutton1988: Learning to predict by the methods of temporal differences

TLDR*: This is Rich Sutton‘s seminal work where he establishes Temporal Difference as a class of methods for multi-step prediction, and contrasts these methods with previous approaches. He also introduces eligibility traces and relates these to other styles of supervised learning. He then provides several examples of TD working in action. He finally proves convergence and optimality (under certain assumptions) for these types of methods.