Rich Sutton

People, Reinforcement Learning

sutton1988: Learning to predict by the methods of temporal differences

TLDR*: This is Rich Sutton‘s seminal work where he establishes Temporal Difference as a class of methods for multi-step prediction, and contrasts these methods with previous approaches. He also introduces eligibility traces and relates these to other styles of supervised learning. He then provides several examples of TD working in action. He finally proves convergence and optimality (under certain assumptions) for these types of methods.