sutton1988: Learning to predict by the methods of temporal differences

Reinforcement Learning

TLDR: This is Rich Sutton‘s seminal work where he establishes Temporal Difference as a class of methods for multi-step prediction, and contrasts these methods with previous approaches. He also introduces eligibility traces and relates these to other styles of supervised learning. He then provides several examples of TD working in action. He finally proves convergence and optimality (under certain assumptions) for these types of methods.

Impact: This paper has had a long lasting impact. Primarily, in establishing temporal difference learning and in the process establishing the Reinforcement Learning field.