About

Apr 9, 2014

I’m a recent PhD graduate from the University of Alberta working in reinforcement learning. I have a BS in Physics and an MS in Computer Science both from Indiana University Bloomington. My current PhD work is focused on reinforcement learning, and specifically in understanding how agents may perceive their world. I focus primarily on prediction making, but have been known to dabble in control from time-to-time. My active research interests include: predictions as a component in intelligence (both artificial and biological), off-policy prediction and policy evaluation, deep learning and resulting learned representations in the reinforcement learning context, and discovery or attention of important abstractions (described as predictions) through interaction.

My Research Interests

The core driving principle of my research passion is to develop, and investigate autonomous agents which continually learn in real world applications. Recently, every year seems to bring the world closer to the autonomous revolution promised by Turing when he first described a general computational machine. While massive progress continues to be made in supervised learning with the automatic generation of images and systems which convincingly provide a platform for synthesizing and communicating the human corpus of Internet communication, there is still a considerable gap in translating this progress to full automation of real-world tasks. This is not to say the deep reinforcement learning has not been successful: deploying agents to compete against world champions in Go (Silver, 2017), StarCraft (Vinyals, 2019, and have even shown promise in fusion reactor control (Degrave, 2022).

The next step in machine intelligence will be in developing full systems which can learn re-usable embodied structures to be leveraged for learning new tasks without interventions from the system designer. Current reinforcement learning research directions require the use of a significant amount of simulated data to learn representations which are often brittle to changes in the data generating distribution. While many labs are focusing on learning representations which are less brittle in continual learning settings, there is still a large gap in how agents will behave to encourage learning about the world—beyond a hand-designed reward function. An autonomous agent which learns continuously without intervention from the designer must be able to 1) propose new goals or predictive challenges, 2) learn behavioral strategies to solve its internalized challenges, and 3) modularly re-use learned embodied structures.

So far, my research strategy to develop continual learning agents has been to develop new algorithms for learning predictive challenges off-policy (Schlegel, 2019), investigate novel connectionist networks which favor more stochastic modular networks (Schlegel, 2021; Gupta, 2021), improve readily used representation learning techniques with straightforward strategies (Schlegel, 2022), and explore how we might guide exploration (Kumaraswamy, 2018; McLeod, 2021). I have the skills to develop curiosity driven agents, and design frameworks to simulate an agent’s lifetime (rather than short lived systems). My expertise positions me to be best suited to tackle the above research question through software simulations, but I believe an approach which incorporates the knowledge developed through behavioral, cognitive, and developmental science is critical to make progress. In the future, I am looking for post doctoral positions to develop and engage with the intersection of artificial intelligence and cognitive science research.

Curiosity to driven behavior through simple mechanisms

While we should focus on an agent’s high-level capabilities to propose and achieve new goals, the recent success of ChatGPT raises a simpler question in relation to goals. The emergent behavior of GPT3 comes from the self-supervised goal of predicting the next word in a document. While simple, this rule has shown success in generating interesting and complex emergent behavior beyond the original intent of the system. We should take this lesson into the continual learning context, and uncover what behaviors emerge from curiosity driven objectives of simple predictive challenges. The learned behavioral strategies may be useful for later tasks (Groth, 2021), or lead to more complex emergent behaviors and objectives naturally as the agent is better able to predict its sensori-motor stream. The types of behavior learned through this process could also lead to systems which can easily adapt as the set of prescribed goals in its environment change (Adaptive Agent Team, 2023).

To stay brief, there are many avenues of research for driving behavior through curiosity. One direction I’ve explored is found in McLeod (2021) and Linke (2020) where curiosity is though through a reward function derived from the change in weights of predictive challenges. These papers explore the consequences of this reward function in several non-stationary prediction challenges in the multi-armed bandit and reinforcement learning settings. You can see these papers for more details on the rest of the field, which might be delved in here at a later date.

Main research question: Does learning to behave through a curiosity driven intrinsic reward to improve a simple prediction challenge result in complex emergent behavior which can be useful for an agent’s full lifetime?

Predictions as cognition

I am keenly interested in the relationship of prior work in predictive cognition in the brain and current research in reinforcement learning, specifically the use and learning of general value functions (GVFs) as predictive units. While there are many types of predictions (and subsequent definitions and nuances) the most interesting is in how predictions effect behavior—defined as anticipation (Bubic, 2010). There are several issues with using GVFs trained online to effect behavior that informs much of my research: stability, off-policy policy evaluation, predictions as representation, and others.

Main research question: Can we develop an account of cognition for both biological and artificial systems using temporal predictions as its core mechanism?

The focus of such an account should be towards plausibility in both biological and artificial systems. Implementation and learning is at the forefront of the account, ensuring such an architecture might be implemented in a continual learning systems.

Off-policy Policy Evaluation and Prediction

To learn many predictions online, it is beneficial to learn about behaviors which are not currently being followed, this is where off-policy policy evaluation comes in. The main contribution of this line of work has been the importance resampling algorithm, which manages to empirically reduce variance in many scenarios without interjecting significant bias or computation requirements, like VTrace or Weighted Importance Sampling respectively. I also have interests in understanding how RMSProp/AMSGrad type learning rate adaptation algorithms interact with off-policy learning, and in extending importance resampling to the case of many value functions with a shared representation.

Discovery in Predictive Representations

Representations built from predictions (and really all methods using predictions to effect behavior) all face a common hurdle, the discovery problem. This is the creation and learning of useful predictions in an online process. My current work focuses on defining the base aspects of the problem, and describing a general framework to do discovery. This leads in two directions. The first is describing an ontology, or an ordering, of general value function question specification to be able to generate a diverse set of questions. The second is in credit assignment or the usefulness of a prediction in driving behavior.

Other Interests

I’ve spent a lot of time developing my musical ability, and even spent a short amount of time as a music student in the Jacobs School of Music. I’m not currently playing in any ensembles, but in the past I’ve been apart of Indiana University’s All-campus Band (first chair), IU’s Concert Band, and the Southern Indiana Wind Ensemble (SIWE) (soloist/2nd chair). I also enjoy Jazz (mostly bebop and big-band) and enjoy going to orchestral concerts (my favourites include the Chicago Symphonic Orchestra and Grant Park Orchestra). In my free time I like playing around with elisp and my emacs configuration, playing piano, reading popsci (especially about intelligence/neuroscience), and drinking espresso!

Matthew Schlegel