Current Learning Objectives

\( \newcommand{\states}{\mathcal{S}} \newcommand{\actions}{\mathcal{A}} \newcommand{\observations}{\mathcal{O}} \newcommand{\rewards}{\mathcal{R}} \newcommand{\traces}{\mathbf{e}} \newcommand{\transition}{P} \newcommand{\reals}{\mathbb{R}} \newcommand{\naturals}{\mathbb{N}} \newcommand{\complexs}{\mathbb{C}} \newcommand{\field}{\mathbb{F}} \newcommand{\numfield}{\mathbb{F}} \newcommand{\expected}{\mathbb{E}} \newcommand{\var}{\mathbb{V}} \newcommand{\by}{\times} \newcommand{\partialderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\defineq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\defeq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\eye}{\Imat} \newcommand{\hadamard}{\odot} \newcommand{\trans}{\top} \newcommand{\inv}{{-1}} \newcommand{\argmax}{\operatorname{argmax}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\avec}{\mathbf{a}} \newcommand{\bvec}{\mathbf{b}} \newcommand{\cvec}{\mathbf{c}} \newcommand{\dvec}{\mathbf{d}} \newcommand{\evec}{\mathbf{e}} \newcommand{\fvec}{\mathbf{f}} \newcommand{\gvec}{\mathbf{g}} \newcommand{\hvec}{\mathbf{h}} \newcommand{\ivec}{\mathbf{i}} \newcommand{\jvec}{\mathbf{j}} \newcommand{\kvec}{\mathbf{k}} \newcommand{\lvec}{\mathbf{l}} \newcommand{\mvec}{\mathbf{m}} \newcommand{\nvec}{\mathbf{n}} \newcommand{\ovec}{\mathbf{o}} \newcommand{\pvec}{\mathbf{p}} \newcommand{\qvec}{\mathbf{q}} \newcommand{\rvec}{\mathbf{r}} \newcommand{\svec}{\mathbf{s}} \newcommand{\tvec}{\mathbf{t}} \newcommand{\uvec}{\mathbf{u}} \newcommand{\vvec}{\mathbf{v}} \newcommand{\wvec}{\mathbf{w}} \newcommand{\xvec}{\mathbf{x}} \newcommand{\yvec}{\mathbf{y}} \newcommand{\zvec}{\mathbf{z}} \newcommand{\Amat}{\mathbf{A}} \newcommand{\Bmat}{\mathbf{B}} \newcommand{\Cmat}{\mathbf{C}} \newcommand{\Dmat}{\mathbf{D}} \newcommand{\Emat}{\mathbf{E}} \newcommand{\Fmat}{\mathbf{F}} \newcommand{\Gmat}{\mathbf{G}} \newcommand{\Hmat}{\mathbf{H}} \newcommand{\Imat}{\mathbf{I}} \newcommand{\Jmat}{\mathbf{J}} \newcommand{\Kmat}{\mathbf{K}} \newcommand{\Lmat}{\mathbf{L}} \newcommand{\Mmat}{\mathbf{M}} \newcommand{\Nmat}{\mathbf{N}} \newcommand{\Omat}{\mathbf{O}} \newcommand{\Pmat}{\mathbf{P}} \newcommand{\Qmat}{\mathbf{Q}} \newcommand{\Rmat}{\mathbf{R}} \newcommand{\Smat}{\mathbf{S}} \newcommand{\Tmat}{\mathbf{T}} \newcommand{\Umat}{\mathbf{U}} \newcommand{\Vmat}{\mathbf{V}} \newcommand{\Wmat}{\mathbf{W}} \newcommand{\Xmat}{\mathbf{X}} \newcommand{\Ymat}{\mathbf{Y}} \newcommand{\Zmat}{\mathbf{Z}} \newcommand{\Sigmamat}{\boldsymbol{\Sigma}} \newcommand{\identity}{\Imat} \newcommand{\epsilonvec}{\boldsymbol{\epsilon}} \newcommand{\thetavec}{\boldsymbol{\theta}} \newcommand{\phivec}{\boldsymbol{\phi}} \newcommand{\muvec}{\boldsymbol{\mu}} \newcommand{\sigmavec}{\boldsymbol{\sigma}} \newcommand{\jacobian}{\mathbf{J}} \newcommand{\ind}{\perp!!!!\perp} \newcommand{\bigoh}{\text{O}} \)

This note serves as a place for me to track my current learning objectives. It is partially an agenda file and partially a note file.

Projects

Incentive Salience

This is an alternative to RPEH, and could potentially explain some data better.

Developmental Reinforcement Learning and Curiosity and Pretraining for Reinforcement Learning

  • Initial

    I want to know more about learning how to behave to learn.

    Some papers suggested by ChatGPT:

    1. “Curiosity-driven Exploration by Self-supervised Prediction” by Pathak et al. (2017) introduces a DRL method that uses curiosity-driven exploration to discover new behaviors and skills.
    2. “Emergence of Grounded Compositional Language in Multi-Agent Populations” by Mordatch and Abbeel (2018) demonstrates how DRL can be used to enable multi-agent populations to develop their own compositional language for communication.
    3. “Open-ended Learning in Symmetric Zero-sum Games” by Lerer et al. (2019) proposes a DRL approach to enable agents to learn in open-ended environments without a predefined task or reward function.
    4. “Reinforcement Learning with Unsupervised Auxiliary Tasks” by Jaderberg et al. (2016) introduces a DRL method that uses unsupervised auxiliary tasks to learn a diverse set of skills that can be useful in a wide range of environments.
    5. “Meta-Reinforcement Learning” by Finn et al. (2017) proposes a DRL approach that enables agents to learn how to adapt to new environments more efficiently by learning to learn.

Topics

More on the different research areas of Reinforcement Learning in the Brain

While niv2009reinforcement: Reinforcement learning in the brain is a good start, there is much more to do and learn here. Really the focus should be on the Reward Prediction-Error Hypothesis of Dopamine and how it applies more or less generally. This also relates to Incentive Salience and where these two hypotheses differ/merge on similar explanations.

Transformers

Self-Supervised Learning Objectives

Offline Reinforcement Learning

Unorganized Topics

TODO Banach Spaces and Convergence

TODO Pretraining for Reinforcement Learning

TODO Basic Inequalities [0/3]

  • TODO Chebyshev

TODO Bayesian Vs Frequentists

TODO colombo2014deep: Deep and beautiful. The reward prediction error hypothesis of dopamine

TODO (Zhang et al. 2009)

TODO Neurotransmitter

TODO Dopaminergic Neurons

TODO Dopamine

IN-PROGRESS niv2009reinforcement: Reinforcement learning in the brain

IN-PROGRESS Recurrent Neural Networks

  • TODO GRU

TODO Actor-critic algorithms

TODO Policy Gradient Methods

TODO Derive the Bellman Equation for general decision problems.

TODO Value Function

TODO Dynamic Programming

TODO Backpropagation

TODO Spiking neural networks

[2021-07-30 Fri] https://cnvrg.io/spiking-neural-networks/

TODO Visual System

TODO Control Theory

TODO Free-Energy Principle

TODO Neuron

TODO Integral Calculus

TODO Cerebellum

TODO Moment Generating Function

TODO Chernoff Bounds

TODO Scientific Method

TODO Kernel Functions

TODO Neuro/Psych background reading. RESOURCE

https://docs.google.com/document/d/111-4SPQ1kEg_yrMfud_26rK7fBHpol59iDnZ9BYuzNc/edit

Questions

How does loss of plasticity affect exploration?

Links to this note: