ADAM
An optimizer which effectively combines RMSProp and Momentum. The algorithm is as follows. Using the gradient \(g_t\), we compute moving averages (or decaying averages) of the first and second moments.
\begin{align*} m_t &= \beta_m m_{t-1} + (1-beta_m) g_t \\ v_t &= \beta_v v_{t-1} + (1-beta_v) g^2_t \end{align*}
Both of these estimated averages are biased towards zero, and thus to unbias these moving averages we scale by \(\frac{1}{1-\beta^t}\)
\begin{align*} \hat{m}_t &= \frac{m_t}{1-\beta_m^t} \\ \hat{v}_t &= \frac{v_t}{1-\beta_v^t} \end{align*}
The first and second moments are then used to calculate the update to the weights:
\begin{align*} \theta_{t+1} = \frac{\eta}{\sqrt{\hat{v}_t} + \epsilon} \hat{m}_t \end{align*}
Where \(\eta, \epsilon, \beta_m, \beta_v\) are all hyper parameters.
Typical settings:
\begin{align*} \beta_m &= 0.9 \\ \beta_v &= 0.999 \\ \epsilon &= 10^{-8} \end{align*}
Links to this note:
- Power Grid Failures
- Human-in-the-Loop Reinforcement Learning
- Data Driven PDE Solvers
- Human-in-the-Loop RL (Industrial)
- karniadakis2021physicsinformed: Physics-Informed Machine Learning
- Current Learning Objectives
- Power Systems Control
- kovachki2024neural: Neural Operator: Learning Maps Between Function Spaces
- Matt Taylor
- RL for Industrial Control
- Zuckerman Institute
- zhang2021learning: Learning Causal State Representations of Partially Observable Environments
- wu2016multiplicative: On Multiplicative Integration with Recurrent Neural Networks
- white2017unifying: Unifying Task Specification in Reinforcement Learning
- weng2021how: How to Train Really Large Models on Many GPUs? | Lil'Log
- wang2017learning: Learning to reinforcement learn
- veeriah2019discovery: Discovery of Useful Questions as Auxiliary Tasks
- vanhasselt2015learning: Learning to Predict Independent of Span
- Upperbound 2023
- Unsupervised Learning
- Types of Learning
- Transformer
- synofzik2013experience: The experience of agency: an interplay between prediction and postdiction
- Taylor Series Expansion
- sutton2020john: John McCarthy's definition of intelligence
- sutskever2011generating: Generating text with recurrent neural networks
- Supervised Learning
- Support Vector Machines
- Sufficient Statistic
- subramanian2020approximate: Approximate information state for approximate planning and reinforcement learning in partially observed systems
- Study Plan
- Subbasis
- stock2004short: A short history of ideo-motor action
- Stochastic Processes
- sternberg2016cognitive: Cognitive Psychology
- Stationary Point
- soga2009predictive: Predictive and postdictive mechanisms jointly contribute to visual awareness
- spratling2017review: A review of predictive coding algorithms
- singh2003learning: Learning Predictive State Representations
- Sigmoid Function
- Shapley Values
- Semi-supervised Learning
- Semiotic
- Self-supervised Learning
- scholkopf2019causality: Causality for Machine Learning
- schwarzer2021pretraining: Pretraining Representations for Data-Efficient Reinforcement Learning
- russell2004history: History of Western Philosophy
- Rosenbrock function
- roy2018editorial: Editorial: Representation in the Brain
- Richard Ernest Bellman
- Resampling
- Reward
- Reproducibility in Science
- Reinforcement Learning in the Brain
- Reinforcement Learning: An Introduction
- reed2022generalist: A Generalist Agent
- rao1999predictive: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects
- Python
- Pytorch
- Psychology
- Puddle World
- Probability via Expectations
- Probability Inequalities
- Pretraining for Reinforcement Learning
- Pragmitism
- Predictive Knowledge
- POMDP
- Policy
- Policy Improvement
- Perpendicular Distance
- Peirce Semiotic
- Org Roam
- Partially Ordered Set
- Order Theory
- Optimization
- Offline RL (Modl)
- OpenAI
- Offline Reinforcement Learning
- Notes for Students
- Off-policy Reinforcement Learning
- Neurotransmitter
- niv2009reinforcement: Reinforcement learning in the brain
- Neurons
- Mountain Car
- mohamed2019monte: Monte Carlo Gradient Estimation in Machine Learning
- Model-based RL
- mnih2014recurrent: Recurrent Models of Visual Attention
- Metric Space
- Meta (Company)
- Maximum Likelihood Estimation
- Markov's Inequality
- Markov's Inequality
- Markov Decisions Process
- machado2018revisiting: Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
- littman2002predictive: Predictive Representations of State
- Linear Regression
- Linear Map
- Linear Programming
- Linear Algebra
- lehnert2017advantages: Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning
- Large Language Models
- Laplace Transform
- Lagrange multipliers
- kostas2019asynchronous: Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
- Kolmogorov Complexity
- Kernel Function
- KL Divergence
- kearney2019making: Making Meaning: Semiotics Within Predictive Knowledge Architectures
- John Hopfield
- james2004learning: Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset
- Job Hunt
- Interview Review Material
- Integral Transform
- Inner Product Space
- Infimum and Supremum
- Incentive Salience
- Hypothesis
- huang2011predictive: Predictive Coding
- hopfield1985neural: ``Neural'' computation of decisions in optimization problems
- Homogeneity
- Hoeffding Inequality
- hochreiter1997long: LONG SHORT-TERM MEMORY
- Hilbert Space
- henderson2018deep: Deep Reinforcement Learning That Matters
- GRU
- Gradient Descent
- GPT4
- GPT3
- goodman2016what: What Does Research Reproducibility Mean?
- gonzalez-soto2019reinforcement: Reinforcement Learning is not a Causal problem
- General Value Functions
- Function Space
- Experience Replay
- Environments
- Empirical Risk Minimization
- Eigenvalues and Eigenvectors
- Efficient Coding
- Dynamic Programming
- Dopaminergic Neurons
- Duality
- Dopamine
- Dirac Delta Function
- Discrete
- Dimensionality Reduction
- Developmental Reinforcement Learning
- DeepMind Lab
- dayan1993improving: Improving Generalization for Temporal Difference Learning: The Successor Representation
- Deep Learning
- Cross-Entropy
- Curiosity
- Correlation
- Critterbot
- Converges
- Continuous
- Control Theory
- Compute Canada
- Connectionist Network
- Company
- cogprints316: Facing Up to the Problem of Consciousness
- colombo2014deep: Deep and beautiful. The reward prediction error hypothesis of dopamine
- Cognitive Revolution
- Cognitive Science
- Classification
- chung2014empirical: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- ChatGPT
- Causality
- Cauchy Criterion
- Calculus
- byrd2019what: What is the Effect of Importance Weighting in Deep Learning?
- Braindump
- bouthillier2019unreproducible: Unreproducible Research is Reproducible
- Bootstrapping (statistics)
- Bellman Equation
- Biased Competition
- Behaviorism
- Behavioral Science
- barreto2018successor: Successor Features for Transfer in Reinforcement Learning
- Behavior-Suite
- badia2020agent57: Agent57: Outperforming the Atari Human Benchmark
- Backpropagation Through Time
- Auxiliary Tasks
- Backpropagation
- Autocorrelation
- Atari
- Additivity
- Active Learning
- Actor Critic
- StudyPlan
- Pretraining Representations for Data-Efficient Reinforcement Learning
- Incintive Salience
- Deep and beautiful. The reward prediction error hypothesis of dopamine
- russellhistory: History of Western Philosophy
- SVD
- Principle Component Analysis