Reward
\( \newcommand{\states}{\mathcal{S}} \newcommand{\actions}{\mathcal{A}} \newcommand{\observations}{\mathcal{O}} \newcommand{\rewards}{\mathcal{R}} \newcommand{\traces}{\mathbf{e}} \newcommand{\transition}{P} \newcommand{\reals}{\mathbb{R}} \newcommand{\naturals}{\mathbb{N}} \newcommand{\complexs}{\mathbb{C}} \newcommand{\field}{\mathbb{F}} \newcommand{\numfield}{\mathbb{F}} \newcommand{\expected}{\mathbb{E}} \newcommand{\var}{\mathbb{V}} \newcommand{\by}{\times} \newcommand{\partialderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\defineq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\defeq}{\stackrel{{\tiny\mbox{def}}}{=}} \newcommand{\eye}{\Imat} \newcommand{\hadamard}{\odot} \newcommand{\trans}{\top} \newcommand{\inv}{{-1}} \newcommand{\argmax}{\operatorname{argmax}} \newcommand{\Prob}{\mathbb{P}} \newcommand{\avec}{\mathbf{a}} \newcommand{\bvec}{\mathbf{b}} \newcommand{\cvec}{\mathbf{c}} \newcommand{\dvec}{\mathbf{d}} \newcommand{\evec}{\mathbf{e}} \newcommand{\fvec}{\mathbf{f}} \newcommand{\gvec}{\mathbf{g}} \newcommand{\hvec}{\mathbf{h}} \newcommand{\ivec}{\mathbf{i}} \newcommand{\jvec}{\mathbf{j}} \newcommand{\kvec}{\mathbf{k}} \newcommand{\lvec}{\mathbf{l}} \newcommand{\mvec}{\mathbf{m}} \newcommand{\nvec}{\mathbf{n}} \newcommand{\ovec}{\mathbf{o}} \newcommand{\pvec}{\mathbf{p}} \newcommand{\qvec}{\mathbf{q}} \newcommand{\rvec}{\mathbf{r}} \newcommand{\svec}{\mathbf{s}} \newcommand{\tvec}{\mathbf{t}} \newcommand{\uvec}{\mathbf{u}} \newcommand{\vvec}{\mathbf{v}} \newcommand{\wvec}{\mathbf{w}} \newcommand{\xvec}{\mathbf{x}} \newcommand{\yvec}{\mathbf{y}} \newcommand{\zvec}{\mathbf{z}} \newcommand{\Amat}{\mathbf{A}} \newcommand{\Bmat}{\mathbf{B}} \newcommand{\Cmat}{\mathbf{C}} \newcommand{\Dmat}{\mathbf{D}} \newcommand{\Emat}{\mathbf{E}} \newcommand{\Fmat}{\mathbf{F}} \newcommand{\Gmat}{\mathbf{G}} \newcommand{\Hmat}{\mathbf{H}} \newcommand{\Imat}{\mathbf{I}} \newcommand{\Jmat}{\mathbf{J}} \newcommand{\Kmat}{\mathbf{K}} \newcommand{\Lmat}{\mathbf{L}} \newcommand{\Mmat}{\mathbf{M}} \newcommand{\Nmat}{\mathbf{N}} \newcommand{\Omat}{\mathbf{O}} \newcommand{\Pmat}{\mathbf{P}} \newcommand{\Qmat}{\mathbf{Q}} \newcommand{\Rmat}{\mathbf{R}} \newcommand{\Smat}{\mathbf{S}} \newcommand{\Tmat}{\mathbf{T}} \newcommand{\Umat}{\mathbf{U}} \newcommand{\Vmat}{\mathbf{V}} \newcommand{\Wmat}{\mathbf{W}} \newcommand{\Xmat}{\mathbf{X}} \newcommand{\Ymat}{\mathbf{Y}} \newcommand{\Zmat}{\mathbf{Z}} \newcommand{\Sigmamat}{\boldsymbol{\Sigma}} \newcommand{\identity}{\Imat} \newcommand{\epsilonvec}{\boldsymbol{\epsilon}} \newcommand{\thetavec}{\boldsymbol{\theta}} \newcommand{\phivec}{\boldsymbol{\phi}} \newcommand{\muvec}{\boldsymbol{\mu}} \newcommand{\sigmavec}{\boldsymbol{\sigma}} \newcommand{\jacobian}{\mathbf{J}} \newcommand{\ind}{\perp!!!!\perp} \newcommand{\bigoh}{\text{O}} \)
Links to this note:
- Current Learning Objectives
- schwarzer2021pretraining: Pretraining Representations for Data-Efficient Reinforcement Learning
- Org Roam
- Curiosity
- Pretraining for Reinforcement Learning
- Pretraining Representations for Data-Efficient Reinforcement Learning
- Job Hunt
- Inbox
- Upperbound 2023
- Python
- Zuckerman Institute
- Additivity
- Actor Critic
- Active Learning
- Company
- Large Language Models
- GPT4
- OpenAI
- GPT3
- ChatGPT
- Incentive Salience
- Shapley Values
- Cognitive Revolution
- Developmental Reinforcement Learning
- Reinforcement Learning in the Brain
- Incintive Salience
- Reproducibility in Science
- Predictive Processing
- Deep Learning
- Bootstrapping (statistics)
- bouthillier2019unreproducible: Unreproducible Research is Reproducible
- Resampling
- Policy
- Off-policy Reinforcement Learning
- Connectionist Network
- henderson2018deep: Deep Reinforcement Learning That Matters
- goodman2016what: What Does Research Reproducibility Mean?
- Neurons
- colombo2014deep: Deep and beautiful. The reward prediction error hypothesis of dopamine
- Deep and beautiful. The reward prediction error hypothesis of dopamine
- Peirce Semiotic
- Causality
- Order Theory
- Partially Ordered Set
- Infimum and Supremum
- russell2004history: History of Western Philosophy
- russellhistory: History of Western Philosophy
- cogprints316: Facing Up to the Problem of Consciousness
- roy2018editorial: Editorial: Representation in the Brain
- sternberg2016cognitive: Cognitive Psychology
- Behaviorism
- Psychology
- Behavioral Science
- Linear Algebra
- Calculus
- niv2009reinforcement: Reinforcement learning in the brain
- Dopamine
- Neurotransmitter
- Dopaminergic Neurons
- Bellman Equation
- Autocorrelation
- Correlation
- Interview Review Material
- Linear Regression
- Atari
- Backpropagation Through Time
- badia2020agent57: Agent57: Outperforming the Atari Human Benchmark
- barreto2018successor: Successor Features for Transfer in Reinforcement Learning
- Critterbot
- dayan1993improving: Improving Generalization for Temporal Difference Learning: The Successor Representation
- General Value Functions
- gonzalez-soto2019reinforcement: Reinforcement Learning is not a Causal problem
- hochreiter1997long: LONG SHORT-TERM MEMORY
- hopfield1985neural: ``Neural'' computation of decisions in optimization problems
- huang2011predictive: Predictive Coding
- jaderberg2017reinforcement: Reinforcement Learning with Unsupervised Auxiliary Tasks
- james2004learning: Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset
- John Hopfield
- kearney2019making: Making Meaning: Semiotics Within Predictive Knowledge Architectures
- kostas2019asynchronous: Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
- Lagrange multipliers
- lehnert2017advantages: Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning
- littman2002predictive: Predictive Representations of State
- machado2018revisiting: Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
- Maximum Likelihood Estimation
- mnih2014recurrent: Recurrent Models of Visual Attention
- Model-based RL
- Mountain Car
- Predictive Knowledge
- Reinforcement Learning: An Introduction
- scholkopf2019causality: Causality for Machine Learning
- singh2003learning: Learning Predictive State Representations
- spratling2017review: A review of predictive coding algorithms
- subramanian2020approximate: Approximate information state for approximate planning and reinforcement learning in partially observed systems
- Support Vector Machines
- sutton2011horde: Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction
- sutton2020john: John McCarthy's definition of intelligence
- veeriah2019discovery: Discovery of Useful Questions as Auxiliary Tasks
- wang2017learning: Learning to reinforcement learn
- white2015developing: Developing a predictive approach to knowledge
- zhang2021learning: Learning Causal State Representations of Partially Observable Environments
- Sufficient Statistic
- Auxiliary Tasks
- white2017unifying: Unifying Task Specification in Reinforcement Learning
- vanhasselt2015learning: Learning to Predict Independent of Span
- Value Function
- Unsupervised Learning
- Types of Learning
- Taylor Series Expansion
- synofzik2013experience: The experience of agency: an interplay between prediction and postdiction
- SVD
- sutskever2011generating: Generating text with recurrent neural networks
- Supervised Learning
- Subbasis
- stock2004short: A short history of ideo-motor action
- Stochastic Processes
- Stationary Point
- soga2009predictive: Predictive and postdictive mechanisms jointly contribute to visual awareness
- Sigmoid Function
- Semiotic
- Semi-supervised Learning
- Self-supervised Learning
- Rosenbrock function
- Richard Ernest Bellman
- rao1999predictive: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects
- Puddle World
- Probability via Expectations
- Principle Component Analysis
- Pragmitism
- POMDP
- Policy Improvement
- Perpendicular Distance
- Optimization
- mohamed2019monte: Monte Carlo Gradient Estimation in Machine Learning
- Metric Space
- Markov Decisions Process
- liu2018breaking: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
- Linear Programming
- Linear Map
- Laplace Transform
- Kolmogorov Complexity
- KL Divergence
- Kernel Function
- Integral Transform
- Inner Product Space
- Hypothesis
- Homogeneity
- Hoeffding Inequality
- Hilbert Space
- GRU
- Gradient Descent
- Function Space
- Experience Replay
- Environments
- Empirical Risk Minimization
- Eigenvalues and Eigenvectors
- Efficient Coding
- Dynamic Programming
- Discrete
- Dirac Delta Function
- Dimensionality Reduction
- DeepMind Lab
- Cross-Entropy
- Converges
- Control Theory
- Continuous
- Compute Canada
- Cognitive Science
- Classification
- chung2014empirical: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Cauchy Criterion
- byrd2019what: What is the Effect of Importance Weighting in Deep Learning?
- Biased Competition
- Behavior-Suite
- Backpropagation
- Creed(o) of Procrastination
- About