Current Learning Objectives

\( \newcommand{\states}{\mathcal{S}} \newcommand{\actions}{\mathcal{A}} \newcommand{\observations}{\mathcal{O}} \newcommand{\rewards}{\mathcal{R}} \newcommand{\traces}{\mathbf{e}} \newcommand{\transition}{P} \newcommand{\reals}{\mathbb{R}} \newcommand{\naturals}{\mathbb{N}} \newcommand{\complexs}{\mathbb{C}} \newcommand{\field}{\mathbb{F}} \newcommand{\numfield}{\mathbb{F}} \newcommand{\expected}{\mathbb{E}} \newcommand{\var}{\mathbb{V}} \newcommand{\by}{\times} \newcommand{\partialderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\defineq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\defeq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\eye}{\Imat} \newcommand{\hadamard}{\odot} \newcommand{\trans}{\top} \newcommand{\inv}{{-1}} \newcommand{\argmax}{\operatorname{argmax}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\avec}{\mathbf{a}} \newcommand{\bvec}{\mathbf{b}} \newcommand{\cvec}{\mathbf{c}} \newcommand{\dvec}{\mathbf{d}} \newcommand{\evec}{\mathbf{e}} \newcommand{\fvec}{\mathbf{f}} \newcommand{\gvec}{\mathbf{g}} \newcommand{\hvec}{\mathbf{h}} \newcommand{\ivec}{\mathbf{i}} \newcommand{\jvec}{\mathbf{j}} \newcommand{\kvec}{\mathbf{k}} \newcommand{\lvec}{\mathbf{l}} \newcommand{\mvec}{\mathbf{m}} \newcommand{\nvec}{\mathbf{n}} \newcommand{\ovec}{\mathbf{o}} \newcommand{\pvec}{\mathbf{p}} \newcommand{\qvec}{\mathbf{q}} \newcommand{\rvec}{\mathbf{r}} \newcommand{\svec}{\mathbf{s}} \newcommand{\tvec}{\mathbf{t}} \newcommand{\uvec}{\mathbf{u}} \newcommand{\vvec}{\mathbf{v}} \newcommand{\wvec}{\mathbf{w}} \newcommand{\xvec}{\mathbf{x}} \newcommand{\yvec}{\mathbf{y}} \newcommand{\zvec}{\mathbf{z}} \newcommand{\Amat}{\mathbf{A}} \newcommand{\Bmat}{\mathbf{B}} \newcommand{\Cmat}{\mathbf{C}} \newcommand{\Dmat}{\mathbf{D}} \newcommand{\Emat}{\mathbf{E}} \newcommand{\Fmat}{\mathbf{F}} \newcommand{\Gmat}{\mathbf{G}} \newcommand{\Hmat}{\mathbf{H}} \newcommand{\Imat}{\mathbf{I}} \newcommand{\Jmat}{\mathbf{J}} \newcommand{\Kmat}{\mathbf{K}} \newcommand{\Lmat}{\mathbf{L}} \newcommand{\Mmat}{\mathbf{M}} \newcommand{\Nmat}{\mathbf{N}} \newcommand{\Omat}{\mathbf{O}} \newcommand{\Pmat}{\mathbf{P}} \newcommand{\Qmat}{\mathbf{Q}} \newcommand{\Rmat}{\mathbf{R}} \newcommand{\Smat}{\mathbf{S}} \newcommand{\Tmat}{\mathbf{T}} \newcommand{\Umat}{\mathbf{U}} \newcommand{\Vmat}{\mathbf{V}} \newcommand{\Wmat}{\mathbf{W}} \newcommand{\Xmat}{\mathbf{X}} \newcommand{\Ymat}{\mathbf{Y}} \newcommand{\Zmat}{\mathbf{Z}} \newcommand{\Sigmamat}{\boldsymbol{\Sigma}} \newcommand{\identity}{\Imat} \newcommand{\epsilonvec}{\boldsymbol{\epsilon}} \newcommand{\thetavec}{\boldsymbol{\theta}} \newcommand{\phivec}{\boldsymbol{\phi}} \newcommand{\muvec}{\boldsymbol{\mu}} \newcommand{\sigmavec}{\boldsymbol{\sigma}} \newcommand{\jacobian}{\mathbf{J}} \newcommand{\ind}{\perp!!!!\perp} \newcommand{\bigoh}{\text{O}} \)

This note serves as a place for me to track my current learning objectives. It is partially an agenda file and partially a note file.

Projects

TODO-LO Power Systems Control POWER

  • TODO Take basic framework learned through writing the grant and implant in notes. @write
  • TODO Tie different parts of the problem space to reinforcement learning needs @think
  • TODO Input literature notes of papers already taken @write

Incentive Salience

This is an alternative to RPEH, and could potentially explain some data better.

Developmental Reinforcement Learning and Curiosity and Pretraining for Reinforcement Learning

  • Initial

    I want to know more about learning how to behave to learn.

    Some papers suggested by ChatGPT:

    1. “Curiosity-driven Exploration by Self-supervised Prediction” by Pathak et al. (2017) introduces a DRL method that uses curiosity-driven exploration to discover new behaviors and skills.
    2. “Emergence of Grounded Compositional Language in Multi-Agent Populations” by Mordatch and Abbeel (2018) demonstrates how DRL can be used to enable multi-agent populations to develop their own compositional language for communication.
    3. “Open-ended Learning in Symmetric Zero-sum Games” by Lerer et al. (2019) proposes a DRL approach to enable agents to learn in open-ended environments without a predefined task or reward function.
    4. “Reinforcement Learning with Unsupervised Auxiliary Tasks” by Jaderberg et al. (2016) introduces a DRL method that uses unsupervised auxiliary tasks to learn a diverse set of skills that can be useful in a wide range of environments.
    5. “Meta-Reinforcement Learning” by Finn et al. (2017) proposes a DRL approach that enables agents to learn how to adapt to new environments more efficiently by learning to learn.

TODO-LO Neural Operators

Neural operators focus on mapping infinite dimensional (or functional) spaces between each other. Might be related to kernel methods in mapping between infinite dimensional feature spaces to various other spaces.

Topics

More on the different research areas of Reinforcement Learning in the Brain

While niv2009reinforcement: Reinforcement learning in the brain is a good start, there is much more to do and learn here. Really the focus should be on the Reward Prediction-Error Hypothesis of Dopamine and how it applies more or less generally. This also relates to Incentive Salience and where these two hypotheses differ/merge on similar explanations.

Transformer

Offline Reinforcement Learning

Learning Large Models

TODO Unorganized Topics [0/12]

TODO colombo2014deep: Deep and beautiful. The reward prediction error hypothesis of dopamine

TODO (Zhang et al. 2009)

TODO (Orvieto et al. 2023)

TODO Actor-critic algorithms

TODO Policy Gradient Methods

TODO Spiking neural networks

[2021-07-30 Fri] https://cnvrg.io/spiking-neural-networks/

TODO Visual System

TODO Control Theory

TODO Free-Energy Principle

TODO Cerebellum

TODO Neuro/Psych background reading. RESOURCE

https://docs.google.com/document/d/111-4SPQ1kEg_yrMfud_26rK7fBHpol59iDnZ9BYuzNc/edit

TODO Mary’s Room thought experiment by Frank Jackson

Basic Notes

TODO Integral Calculus

TODO Derive the Bellman Equation for general decision problems.

TODO Value Function

TODO Dynamic Programming

TODO Backpropagation

TODO Backpropagation Through Time

IN-PROGRESS Recurrent Neural Networks

  • TODO GRU

TODO Neurotransmitter

TODO Dopaminergic Neurons

TODO Dopamine

IN-PROGRESS Basic Inequalities [2/3]

  • IN-PROGRESS Chebyshev

TODO Banach Spaces and Convergence

TODO Bayesian Vs Frequentists

TODO Moment Generating Function

TODO Chernoff Bounds

TODO Scientific Method

TODO Kernel Functions

TODO Neuron

Questions

How does loss of plasticity affect exploration?

Archive

DONE niv2009reinforcement: Reinforcement learning in the brain

Links to this note: