Current Learning Objectives
\( \newcommand{\states}{\mathcal{S}} \newcommand{\actions}{\mathcal{A}} \newcommand{\observations}{\mathcal{O}} \newcommand{\rewards}{\mathcal{R}} \newcommand{\traces}{\mathbf{e}} \newcommand{\transition}{P} \newcommand{\reals}{\mathbb{R}} \newcommand{\naturals}{\mathbb{N}} \newcommand{\complexs}{\mathbb{C}} \newcommand{\field}{\mathbb{F}} \newcommand{\numfield}{\mathbb{F}} \newcommand{\expected}{\mathbb{E}} \newcommand{\var}{\mathbb{V}} \newcommand{\by}{\times} \newcommand{\partialderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\defineq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\defeq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\eye}{\Imat} \newcommand{\hadamard}{\odot} \newcommand{\trans}{\top} \newcommand{\inv}{{-1}} \newcommand{\argmax}{\operatorname{argmax}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\avec}{\mathbf{a}} \newcommand{\bvec}{\mathbf{b}} \newcommand{\cvec}{\mathbf{c}} \newcommand{\dvec}{\mathbf{d}} \newcommand{\evec}{\mathbf{e}} \newcommand{\fvec}{\mathbf{f}} \newcommand{\gvec}{\mathbf{g}} \newcommand{\hvec}{\mathbf{h}} \newcommand{\ivec}{\mathbf{i}} \newcommand{\jvec}{\mathbf{j}} \newcommand{\kvec}{\mathbf{k}} \newcommand{\lvec}{\mathbf{l}} \newcommand{\mvec}{\mathbf{m}} \newcommand{\nvec}{\mathbf{n}} \newcommand{\ovec}{\mathbf{o}} \newcommand{\pvec}{\mathbf{p}} \newcommand{\qvec}{\mathbf{q}} \newcommand{\rvec}{\mathbf{r}} \newcommand{\svec}{\mathbf{s}} \newcommand{\tvec}{\mathbf{t}} \newcommand{\uvec}{\mathbf{u}} \newcommand{\vvec}{\mathbf{v}} \newcommand{\wvec}{\mathbf{w}} \newcommand{\xvec}{\mathbf{x}} \newcommand{\yvec}{\mathbf{y}} \newcommand{\zvec}{\mathbf{z}} \newcommand{\Amat}{\mathbf{A}} \newcommand{\Bmat}{\mathbf{B}} \newcommand{\Cmat}{\mathbf{C}} \newcommand{\Dmat}{\mathbf{D}} \newcommand{\Emat}{\mathbf{E}} \newcommand{\Fmat}{\mathbf{F}} \newcommand{\Gmat}{\mathbf{G}} \newcommand{\Hmat}{\mathbf{H}} \newcommand{\Imat}{\mathbf{I}} \newcommand{\Jmat}{\mathbf{J}} \newcommand{\Kmat}{\mathbf{K}} \newcommand{\Lmat}{\mathbf{L}} \newcommand{\Mmat}{\mathbf{M}} \newcommand{\Nmat}{\mathbf{N}} \newcommand{\Omat}{\mathbf{O}} \newcommand{\Pmat}{\mathbf{P}} \newcommand{\Qmat}{\mathbf{Q}} \newcommand{\Rmat}{\mathbf{R}} \newcommand{\Smat}{\mathbf{S}} \newcommand{\Tmat}{\mathbf{T}} \newcommand{\Umat}{\mathbf{U}} \newcommand{\Vmat}{\mathbf{V}} \newcommand{\Wmat}{\mathbf{W}} \newcommand{\Xmat}{\mathbf{X}} \newcommand{\Ymat}{\mathbf{Y}} \newcommand{\Zmat}{\mathbf{Z}} \newcommand{\Sigmamat}{\boldsymbol{\Sigma}} \newcommand{\identity}{\Imat} \newcommand{\epsilonvec}{\boldsymbol{\epsilon}} \newcommand{\thetavec}{\boldsymbol{\theta}} \newcommand{\phivec}{\boldsymbol{\phi}} \newcommand{\muvec}{\boldsymbol{\mu}} \newcommand{\sigmavec}{\boldsymbol{\sigma}} \newcommand{\jacobian}{\mathbf{J}} \newcommand{\ind}{\perp!!!!\perp} \newcommand{\bigoh}{\text{O}} \)
This note serves as a place for me to track my current learning objectives. It is partially an agenda file and partially a note file.
Projects
TODO-LO Power Systems Control POWER
- TODO Take basic framework learned through writing the grant and implant in notes. @write
- TODO Tie different parts of the problem space to reinforcement learning needs @think
- TODO Input literature notes of papers already taken @write
Incentive Salience
This is an alternative to RPEH, and could potentially explain some data better.
Developmental Reinforcement Learning and Curiosity and Pretraining for Reinforcement Learning
-
Initial
I want to know more about learning how to behave to learn.
Some papers suggested by ChatGPT:
- “Curiosity-driven Exploration by Self-supervised Prediction” by Pathak et al. (2017) introduces a DRL method that uses curiosity-driven exploration to discover new behaviors and skills.
- “Emergence of Grounded Compositional Language in Multi-Agent Populations” by Mordatch and Abbeel (2018) demonstrates how DRL can be used to enable multi-agent populations to develop their own compositional language for communication.
- “Open-ended Learning in Symmetric Zero-sum Games” by Lerer et al. (2019) proposes a DRL approach to enable agents to learn in open-ended environments without a predefined task or reward function.
- “Reinforcement Learning with Unsupervised Auxiliary Tasks” by Jaderberg et al. (2016) introduces a DRL method that uses unsupervised auxiliary tasks to learn a diverse set of skills that can be useful in a wide range of environments.
- “Meta-Reinforcement Learning” by Finn et al. (2017) proposes a DRL approach that enables agents to learn how to adapt to new environments more efficiently by learning to learn.
-
This project has most recently started tearing into some pre-training literature (i.e. Pretraining for Reinforcement Learning). There is a lot of interesting work in that direction, and I think it might be a good place to start in terms of developing a pre-training agent, and then playing around with the data distributions used to train such an agent.
This makes for many more ideas to read about:
- CANCELLED (Xie et al. 2022) READCANCELLED
- CANCELLED (Balestriero et al. 2023) READCANCELLED
- CANCELLED schwarzer2021pretraining: Pretraining Representations for Data-Efficient Reinforcement Learning READCANCELLED
- CANCELLED (Sontakke et al. 2021) READCANCELLED
- CANCELLED (Pathak et al. 2017) READCANCELLED
- CANCELLED (Berlyne 1960) READCANCELLED
TODO-LO Neural Operators
Neural operators focus on mapping infinite dimensional (or functional) spaces between each other. Might be related to kernel methods in mapping between infinite dimensional feature spaces to various other spaces.
Topics
More on the different research areas of Reinforcement Learning in the Brain
While niv2009reinforcement: Reinforcement learning in the brain is a good start, there is much more to do and learn here. Really the focus should be on the Reward Prediction-Error Hypothesis of Dopamine and how it applies more or less generally. This also relates to Incentive Salience and where these two hypotheses differ/merge on similar explanations.
Transformer
Offline Reinforcement Learning
Learning Large Models
- TODO (Weng 2021)
TODO Unorganized Topics [0/12]
TODO colombo2014deep: Deep and beautiful. The reward prediction error hypothesis of dopamine
TODO (Zhang et al. 2009)
TODO (Orvieto et al. 2023)
TODO Actor-critic algorithms
TODO Policy Gradient Methods
TODO Spiking neural networks
https://cnvrg.io/spiking-neural-networks/
TODO Visual System
TODO Control Theory
TODO Free-Energy Principle
TODO Cerebellum
TODO Neuro/Psych background reading. RESOURCE
https://docs.google.com/document/d/111-4SPQ1kEg_yrMfud_26rK7fBHpol59iDnZ9BYuzNc/edit
TODO Mary’s Room thought experiment by Frank Jackson
Basic Notes
TODO Integral Calculus
TODO Derive the Bellman Equation for general decision problems.
TODO Value Function
TODO Dynamic Programming
TODO Backpropagation
TODO Backpropagation Through Time
IN-PROGRESS Recurrent Neural Networks
- TODO GRU
TODO Neurotransmitter
TODO Dopaminergic Neurons
TODO Dopamine
IN-PROGRESS Basic Inequalities [2/3]
- DONE Hoeffding Inequality
- DONE Markov’s Inequality
- IN-PROGRESS Chebyshev