ADAM
An optimizer which effectively combines RMSProp and Momentum. The algorithm is as follows. Using the gradient \(g_t\), we compute moving averages (or decaying averages) of the first and second moments.
\begin{align*} m_t &= \beta_m m_{t-1} + (1-beta_m) g_t \\ v_t &= \beta_v v_{t-1} + (1-beta_v) g^2_t \end{align*}
Both of these estimated averages are biased towards zero, and thus to unbias these moving averages we scale by \(\frac{1}{1-\beta^t}\)
\begin{align*} \hat{m}_t &= \frac{m_t}{1-\beta_m^t} \\ \hat{v}_t &= \frac{v_t}{1-\beta_v^t} \end{align*}
The first and second moments are then used to calculate the update to the weights:
\begin{align*} \theta_{t+1} = \frac{\eta}{\sqrt{\hat{v}_t} + \epsilon} \hat{m}_t \end{align*}
Where \(\eta, \epsilon, \beta_m, \beta_v\) are all hyper parameters.
Typical settings:
\begin{align*} \beta_m &= 0.9 \\ \beta_v &= 0.999 \\ \epsilon &= 10^{-8} \end{align*}
Links to this note:
- Current Learning Objectives
- schwarzer2021pretraining: Pretraining Representations for Data-Efficient Reinforcement Learning
- Org Roam
- Curiosity
- Pretraining for Reinforcement Learning
- Pretraining Representations for Data-Efficient Reinforcement Learning
- Job Hunt
- Inbox
- Upperbound 2023
- Python
- Zuckerman Institute
- Additivity
- Actor Critic
- Active Learning
- Company
- Large Language Models
- GPT4
- OpenAI
- GPT3
- ChatGPT
- Incentive Salience
- Shapley Values
- Cognitive Revolution
- Developmental Reinforcement Learning
- Reinforcement Learning in the Brain
- Incintive Salience
- Reproducibility in Science
- Deep Learning
- Bootstrapping (statistics)
- bouthillier2019unreproducible: Unreproducible Research is Reproducible
- Resampling
- Policy
- Off-policy Reinforcement Learning
- Connectionist Network
- henderson2018deep: Deep Reinforcement Learning That Matters
- goodman2016what: What Does Research Reproducibility Mean?
- Neurons
- colombo2014deep: Deep and beautiful. The reward prediction error hypothesis of dopamine
- Deep and beautiful. The reward prediction error hypothesis of dopamine
- Peirce Semiotic
- Causality
- Order Theory
- Partially Ordered Set
- Infimum and Supremum
- russell2004history: History of Western Philosophy
- russellhistory: History of Western Philosophy
- cogprints316: Facing Up to the Problem of Consciousness
- roy2018editorial: Editorial: Representation in the Brain
- sternberg2016cognitive: Cognitive Psychology
- Behaviorism
- Psychology
- Behavioral Science
- Linear Algebra
- Calculus
- niv2009reinforcement: Reinforcement learning in the brain
- Dopamine
- Neurotransmitter
- Dopaminergic Neurons
- Bellman Equation
- Autocorrelation
- Correlation
- Interview Review Material
- Linear Regression
- Atari
- Backpropagation Through Time
- badia2020agent57: Agent57: Outperforming the Atari Human Benchmark
- barreto2018successor: Successor Features for Transfer in Reinforcement Learning
- Critterbot
- dayan1993improving: Improving Generalization for Temporal Difference Learning: The Successor Representation
- General Value Functions
- gonzalez-soto2019reinforcement: Reinforcement Learning is not a Causal problem
- hochreiter1997long: LONG SHORT-TERM MEMORY
- hopfield1985neural: ``Neural'' computation of decisions in optimization problems
- huang2011predictive: Predictive Coding
- james2004learning: Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset
- John Hopfield
- kearney2019making: Making Meaning: Semiotics Within Predictive Knowledge Architectures
- kostas2019asynchronous: Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
- Lagrange multipliers
- lehnert2017advantages: Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning
- littman2002predictive: Predictive Representations of State
- machado2018revisiting: Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
- Maximum Likelihood Estimation
- mnih2014recurrent: Recurrent Models of Visual Attention
- Model-based RL
- Mountain Car
- Predictive Knowledge
- Reinforcement Learning: An Introduction
- scholkopf2019causality: Causality for Machine Learning
- singh2003learning: Learning Predictive State Representations
- spratling2017review: A review of predictive coding algorithms
- subramanian2020approximate: Approximate information state for approximate planning and reinforcement learning in partially observed systems
- Support Vector Machines
- sutton2020john: John McCarthy's definition of intelligence
- veeriah2019discovery: Discovery of Useful Questions as Auxiliary Tasks
- wang2017learning: Learning to reinforcement learn
- zhang2021learning: Learning Causal State Representations of Partially Observable Environments
- Sufficient Statistic
- Auxiliary Tasks
- wu2016multiplicative: On Multiplicative Integration with Recurrent Neural Networks
- white2017unifying: Unifying Task Specification in Reinforcement Learning
- vanhasselt2015learning: Learning to Predict Independent of Span
- Unsupervised Learning
- Types of Learning
- Taylor Series Expansion
- synofzik2013experience: The experience of agency: an interplay between prediction and postdiction
- SVD
- sutskever2011generating: Generating text with recurrent neural networks
- Supervised Learning
- Subbasis
- stock2004short: A short history of ideo-motor action
- Stochastic Processes
- Stationary Point
- soga2009predictive: Predictive and postdictive mechanisms jointly contribute to visual awareness
- Sigmoid Function
- Semiotic
- Semi-supervised Learning
- Self-supervised Learning
- Rosenbrock function
- Richard Ernest Bellman
- Reward
- rao1999predictive: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects
- Puddle World
- Probability via Expectations
- Principle Component Analysis
- Pragmitism
- POMDP
- Policy Improvement
- Perpendicular Distance
- Optimization
- mohamed2019monte: Monte Carlo Gradient Estimation in Machine Learning
- Metric Space
- Markov Decisions Process
- Linear Programming
- Linear Map
- Laplace Transform
- Kolmogorov Complexity
- KL Divergence
- Kernel Function
- Integral Transform
- Inner Product Space
- Hypothesis
- Homogeneity
- Hoeffding Inequality
- Hilbert Space
- GRU
- Gradient Descent
- Function Space
- Experience Replay
- Environments
- Empirical Risk Minimization
- Eigenvalues and Eigenvectors
- Efficient Coding
- Dynamic Programming
- Discrete
- Dirac Delta Function
- Dimensionality Reduction
- DeepMind Lab
- Cross-Entropy
- Converges
- Control Theory
- Continuous
- Compute Canada
- Cognitive Science
- Classification
- chung2014empirical: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Cauchy Criterion
- byrd2019what: What is the Effect of Importance Weighting in Deep Learning?
- Biased Competition
- Behavior-Suite
- Backpropagation