General Documentation

This page hosts the general documentation of the ActionRNNs.jl library. This includes all research code used in this project.

General Documentation

Index

ActionRNNs.MaskedGridWorldHelpers
ActionRNNs.AbstractActionRNN
ActionRNNs.AbstractERAgent
ActionRNNs.ActionDense
ActionRNNs.CircularBuffer
ActionRNNs.DRQNAgent
ActionRNNs.DRTDNAgent
ActionRNNs.DirectionalTMaze
ActionRNNs.ExpUtils.FluxUtils.RMSPropTF
ActionRNNs.ExpUtils.FluxUtils.RMSPropTFCentered
ActionRNNs.HSMinimize
ActionRNNs.HSRefil
ActionRNNs.HSStale
ActionRNNs.LinkedChainsV2
ActionRNNs.LunarLander
ActionRNNs.MaskedGridWorld
ActionRNNs.QLearning
ActionRNNs.RingWorld
ActionRNNs.StateBuffer
ActionRNNs.TMaze
ActionRNNs.UpdateTimer
ActionRNNs.ϵGreedy
ActionRNNs.ϵGreedyDecay
ActionRNNs.AAGRU
ActionRNNs.AALSTM
ActionRNNs.AARNN
ActionRNNs.ActionGatedRNN
ActionRNNs.CaddAAGRU
ActionRNNs.CaddElGRU
ActionRNNs.CaddElRNN
ActionRNNs.CaddGRU
ActionRNNs.CaddMAGRU
ActionRNNs.CaddRNN
ActionRNNs.CcatGRU
ActionRNNs.CcatRNN
ActionRNNs.CsoftmaxElGRU
ActionRNNs.CsoftmaxElRNN
ActionRNNs.ExpUtils.FluxUtils.get_optimizer
ActionRNNs.FacMAGRU
ActionRNNs.FacMARNN
ActionRNNs.FacTucMAGRU
ActionRNNs.FacTucMARNN
ActionRNNs.GAIAGRU
ActionRNNs.GAIALSTM
ActionRNNs.GAIARNN
ActionRNNs.GAIGRU
ActionRNNs.GAUGRU
ActionRNNs.MAGRU
ActionRNNs.MALSTM
ActionRNNs.MARNN
ActionRNNs.MixElGRU
ActionRNNs.MixElRNN
ActionRNNs.MixGRU
ActionRNNs.MixRNN
ActionRNNs._needs_action_input
ActionRNNs.build_gating_network
ActionRNNs.build_new_feat
ActionRNNs.build_rnn_layer
ActionRNNs.capacity
ActionRNNs.contains_comp
ActionRNNs.contains_layer_type
ActionRNNs.contains_rnn_type
ActionRNNs.find_layers_with_eq
ActionRNNs.find_layers_with_recur
ActionRNNs.get_action_and_prob
ActionRNNs.get_device
ActionRNNs.get_hs_details_for_er
ActionRNNs.get_hs_from_experience!
ActionRNNs.get_hs_replay_strategy
ActionRNNs.get_hs_symbol_list
ActionRNNs.get_information_from_experience
ActionRNNs.get_learning_update
ActionRNNs.get_model
ActionRNNs.get_prob
ActionRNNs.get_replay
ActionRNNs.get_state_from_experience
ActionRNNs.hs_symbol_layer
ActionRNNs.make_obs_list
ActionRNNs.make_replay
ActionRNNs.modify_hs_in_er!
ActionRNNs.modify_hs_in_er_by_grad!
ActionRNNs.needs_action_input
ActionRNNs.reset!
ActionRNNs.sample
ActionRNNs.set_training_mode!
ActionRNNs.training_mode
ActionRNNs.update!
ActionRNNs.update!
ActionRNNs.update!
ActionRNNs.update_target_network!
Base.length
Base.push!
HelpfulKernelFuncs.contract_WA
HelpfulKernelFuncs.get_waa
MinimalRLCore.start!
MinimalRLCore.step!
MinimalRLCore.step!

Cells

ActionRNNs.AbstractActionRNN — Type

AbstractActionRNN

An abstract struct which will take the current hidden state and a tuple of observations and actions and returns the next hidden state.

source

ActionRNNs._needs_action_input — Function

_needs_action_input

If true, this means the cell or layer needs a tuple as input.

source

Basic Cells

ActionRNNs.AARNN — Function

AARNN(in::Integer, actions::Integer, out::Integer, σ = tanh)

Like an RNN cell, except takes a tuple (action, observation) as input. The action is used with get_waa with results added to the usual update.

The update is as follows: σ.(Wi*o .+ get_waa(Wa, a) .+ Wh*h .+ b)

source

ActionRNNs.AAGRU — Function

AAGRU(in, actions, out)

Additive Action Gated Recurrent Unit layer. Behaves like an AARNN but uses a GRU internal structure

source

ActionRNNs.AALSTM — Function

AALSTM(in::Integer, na::Integer, out::Integer)

Additive Action Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. See this article for a good overview of the internals.

source

ActionRNNs.MARNN — Function

MARNN(in::Integer, actions::Integer, out::Integer, σ = tanh)

This cell incorporates the action as a multiplicative operation. We use contract_WA and get_waa to handle this.

The update is as follows:

new_h = σ.(contract_WA(m.Wx, a, o) .+ contract_WA(m.Wh, a, h) .+ get_waa(m.b, a))

source

ActionRNNs.MAGRU — Function

MAGRU(in, actions, out)

Multiplicative Action Gated Recurrent Unit layer. Behaves like an MARNN but uses a GRU internal structure.

source

ActionRNNs.MALSTM — Function

MALSTM(in::Integer, na::Integer, out::Integer)

Muliplicative Action Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. See this article for a good overview of the internals.

source

ActionRNNs.FacMARNN — Function

FacMARNN(in::Integer, actions::Integer, out::Integer, factors, σ = tanh; init_style="ignore")

This cell incorporates the action as a multiplicative operation, but as a factored approximation of the multiplicative version. This cell uses get_waa. Uses CP decomposition.

The update is as follows:

   new_h = m.σ.(W*((Wx*o .+ Wh*h) .* get_waa(Wa, a)) .+ get_waa(m.b, a))

Three init_styles:

standard: using init and initb w/o any keywords
ignore: W = init(out, factors, ignore_dims=2)
tensor: Decompose W_t = init(actions, out, in+out; ignore_dims=1) to get W_o, W_a, W_hi using TensorToolbox.cp_als.

source

ActionRNNs.FacMAGRU — Function

FacMAGRU(in, actions, out, factors)

Factored Multiplicative Action Gated Recurrent Unit layer. Behaves like an FacMARNN but uses a GRU internal structure.

Three init_styles:

standard: using init and initb w/o any keywords
ignore: W = init(out, factors, ignore_dims=2)
tensor: Decompose W_t = init(actions, out, in+out; ignore_dims=1) to get W_o, W_a, W_hi using TensorToolbox.cp_als.

source

ActionRNNs.FacTucMARNN — Function

FacTucMARNN(in::Integer, actions::Integer, out::Integer, action_factors, out_factors, in_factors, σ = tanh; init_style="ignore")

This cell incorporates the action as a multiplicative operation, but as a factored approximation of the multiplicative version. This cell uses get_waa. Uses Tucker decomposition.

Three init_styles:

standard: using init and initb w/o any keywords
ignore: Wa = init(action_factors, actions; ignore_dims=2)

source

ActionRNNs.FacTucMAGRU — Function

FacTucMAGRU(in, actions, out, factors)

Factored Multiplicative Action Gated Recurrent Unit layer. Behaves like an FacTucMARNN but uses a GRU internal structure.

source

Mixed Cells

ActionRNNs.MixRNN — Function

MixRNN(in, actions, out, num_experts, σ = tanh)

Mixing between num_experts AARNN cells. Uses the weighting

h′ = sum(θ[i] .* expert_h′[i] for i in 1:length(θ)) ./ sum(θ)

source

ActionRNNs.MixElRNN — Function

MixElRNN(in, actions, out, num_experts, σ = tanh)

Mixing between num_experts AARNN cells. Uses the weighting

h′ = sum(θ[i] .* expert_h′[i] for i in 1:length(θ)) ./ sum(θ)

(here θ[i] is a vector).

source

ActionRNNs.MixGRU — Function

MixGRU(in, actions, out, num_experts)

Mixing between num_experts AAGRU cells. Uses the weighting

h′ = sum(θ[i] .* expert_h′[i] for i in 1:length(θ)) ./ sum(θ)

source

ActionRNNs.MixElGRU — Function

MixElGRU(in, actions, out, num_experts)

Mixing between num_experts AAGRU cells. Uses the weighting

h′ = sum(θ[i] .* expert_h′[i] for i in 1:length(θ)) ./ sum(θ)

(here θ[i] is a vector).

source

ActionRNNs.ActionGatedRNN — Function

ActionGatedRNN(in::Integer, na, internal, out::Integer, σ = tanh)

The most basic recurrent layer; essentially acts as a Dense layer, but with the output fed back into the input each time step.

source

Old/DefunctCells

ActionRNNs.GAUGRU — Function

GAUGRU(in::Integer, na::Integer, internal::Integer, out::Integer)

Gated Action Input Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. See this article for a good overview of the internals.

source

ActionRNNs.GAIGRU — Function

GAIGRU(in::Integer, na::Integer, internal::Integer, out::Integer)

Gated Action Input Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. See this article for a good overview of the internals.

source

ActionRNNs.GAIARNN — Function

GAIARNN(in::Integer, na, internal, out::Integer, σ = tanh)

The most basic recurrent layer; essentially acts as a Dense layer, but with the output fed back into the input each time step.

source

ActionRNNs.GAIAGRU — Function

GAIAGRU(in::Integer, na::Integer, internal::Integer, out::Integer)

Gated Action Input Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. See this article for a good overview of the internals.

source

ActionRNNs.GAIALSTM — Function

GAIALSTM(in::Integer, na::Integer, out::Integer)

Gated Action Input by Action Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences. See this article for a good overview of the internals.

source

Shared operations for cells

HelpfulKernelFuncs.contract_WA — Function

contract_WA(W, a::Int, x)
contract_WA(W, a::AbstractVector{Int}, x)
contract_WA(W, a::AbstractVector{<:AbstractFloat}, x)
contract_WA(W::CuArray, a::AbstractVector{Int}, x)

This contraction operator will take the weights W, action (or action vector for batches) a, and features. The weight matrix is assumed to be in nactions × out × in.

HelpfulKernelFuncs.get_waa — Function

get_waa(Wa, a)

Different ways of handeling geting action value from a set of weights. This operation can be seen as Wa*a where Wa is the weight matrix, and a is the action representation. This is to be used with various cells to incorporate this operation more reliably.

Other Layers

ActionRNNs.ActionDense — Type

ActionDense(in, na, out, σ; init, bias)

Create an actions Dense layer. This layer takes in a tuple (action, observaiton) and returns the dense layer using and additive approach. This can be used for previous actions or current actions.

source

Learning Updates

ActionRNNs.QLearning — Type

QLearning
QLearningMSE(γ)
QLearningSUM(γ)
QLearningHUBER(γ)

Watkins q-learning with various loss functions.

source

Constructors

ActionRNNs.build_rnn_layer — Function

build_rnn_layer(in, actions, out, parsed, rng)

Build an rnn layer according from parsed. This assumes the "cell" key is in the parsed dict. in, actions, and out are integers. must explicitly pass in a RNG.

Gets layer constructor from either the ActionRNNs or Flux namespaces.

Types of build types

BuildActionRNN: AARNN, MARNN, AAGRU, MAGRU, AALSTM, MALSTM
BuildFactored: FacMARNN, FacMAGRU
BuildTucFactored: FacTucMARNN, FacTucMAGRU
BuildComboCat: CcatRNN, CcatGRU
BuildComboAdd: CaddRNN, CaddGRU, CaddAAGRU, CaddMAGRU, CaddElRNN
BuildMixed: MixRNN, MixElRNN, MixElGRU, MixGRU

source

build_rnn_layer(::BuildActionRNN, args...; kwargs...)

Standard Additive and Multiplicative cells. No extra parameters.

source

build_rnn_layer(::BuildFactored, args...; kwargs...)

Factored (not tucker) cells. Extra Config Options:

init_style::String: They style of init. Check your cell for possible options.
factors::Int: Number of factors in factorization.

source

build_rnn_layer(::BuildTucFactored, args...; kwargs...)

Tucker Factored cells: Extra Config Options:

in_factors::Int: Number of factors in input matrix
action_factors::Int: Number of factors in action matrix
out_factors::Int: Number of factors in out matrix

source

build_rnn_layer(::BuildComboCat, args...; kwargs...)

Combo cat AA/MA cells. No Extra Params.

source

build_rnn_layer(::BuildComboAdd, args...; kwargs...)

Combo add AA/MA cells. No Extra Params.

source

build_rnn_layer(::BuildMixed, args...; kwargs...)

Mixed layers. Extra Config Options -num_experts::Int: number of parallel cells in mixture.

source

build_rnn_layer(::BuildFlux, args...; kwargs...)

Flux cell. No extra parameters.

source

ActionRNNs.build_gating_network — Function

build_gating_network

[[out, activation]]

source

Agents

Experience Replay Agents

ActionRNNs.AbstractERAgent — Type

AbstractERAgent

The abstract struct for building experience replay agents.

example agent: mutable struct DRQNAgent{ER, Φ, Π, HS<:AbstractMatrix{Float32}} <: AbstractERAgent lu::LearningUpdate opt::O model::C target_network::CT

build_features::F
state_list::DataStructures.CircularBuffer{Φ}

hidden_state_init::Dict{Symbol, HS}

replay::ER
update_timer::UpdateTimer
target_update_timer::UpdateTimer

batch_size::Int
τ::Int

s_t::Φ
π::Π
γ::Float32

action::Int
am1::Int
action_prob::Float64

hs_learnable::Bool
beg::Bool
cur_step::Int

hs_tr_init::Dict{Symbol, HS}

end

source

Instantiations

ActionRNNs.DRQNAgent — Type

DRQNAgent

An intense function... lol.

source

ActionRNNs.DRTDNAgent — Type

Basic DRQNAgent.

source

Implementation details

ActionRNNs.get_replay — Function

get_replay(agent::AbstractERAgent)

Get the replay buffer from the agent.

source

ActionRNNs.get_learning_update — Function

get_learning_update(agent::AbstractERAgent)

Get the learning update from the agent.

source

ActionRNNs.get_device — Function

get_device(agent::AbstractERAgent)

Get the current device from the agent.

source

ActionRNNs.get_action_and_prob — Function

get_action_and_prob(π, values, rng)

Get action and the associated probability of taking the action.

source

ActionRNNs.get_model — Function

get_model(agent::AbstractERAgent)

return the model from the agent.

source

MinimalRLCore.start! — Method

    MinimalRLCore.start!(agent::AbstractERAgent, s, rng; kwargs...)

Start the agent for a new episode.

source

MinimalRLCore.step! — Method

MinimalRLCore.step!(agent::AbstractERAgent, env_s_tp1, r, terminal, rng; kwargs...)

step! for an experience replay agent.

source

MinimalRLCore.step! — Function

MinimalRLCore.step!(agent::AbstractERAgent, env_s_tp1, r, terminal, rng; kwargs...)

step! for an experience replay agent.

source

ActionRNNs.training_mode — Function

training_mode(agent::AbstractERAgent)

returns bool whether the agent is in training mode.

source

ActionRNNs.set_training_mode! — Function

set_training_mode(agent::AbstractERAgent, mode::Bool)

sets training mode to boolean value

source

ActionRNNs.update! — Method

update!(agent::AbstractERAgent{<:ControlUpdate}, rng)

Update the parameters of the model.

source

ActionRNNs.update! — Method

update!(agent::AbstractERAgent{<:PredictionUpdate}, rng)

Update the parameters of the model.

source

ActionRNNs.update! — Function

update!(agent::AbstractERAgent{<:ControlUpdate}, rng)

Update the parameters of the model.

source

update!(agent::AbstractERAgent{<:PredictionUpdate}, rng)

Update the parameters of the model.

source

ActionRNNs.update_target_network! — Function

update_target_network!

Update the target network.

source

Online Agents

Tools/Utils

ActionRNNs.UpdateTimer — Type

UpdateTimer

Keeps track of timer for doing things in the agent.

source

ActionRNNs.make_obs_list — Function

make_obs_list

Makes the obs list and initial state used for recurrent networks in an agent. Uses an init function to define the init tuple.

source

ActionRNNs.build_new_feat — Function

build_new_feat(agent, state, action)

convenience for building new feature vector

source

Hidden state manipulation

ActionRNNs.HSStale — Type

HSStale

source

ActionRNNs.HSMinimize — Type

HSMinimize

source

ActionRNNs.HSRefil — Type

HSRefil

source

ActionRNNs.get_hs_replay_strategy — Function

get_hs_replay_strategy(agent::AbstractERAgent)

Get the replay strategy of the agent.

source

ActionRNNs.modify_hs_in_er! — Function

modify_hs_in_er!(hs_strategy::Bool, args...; kwargs...)

Legacy function for hs_strategy as a boolean.

source

ActionRNNs.modify_hs_in_er_by_grad! — Function

modify_hs_in_er!

Updating hidden state in the experience replay buffer.

source

ActionRNNs.reset! — Function

reset!(m, h_init::Dict)
reset!(m::Flux.Recur, h_init)

Reset the hidden state according to the dict hinit with keys from [`gethssymbollist`](@ref). If model is a recur just replace the hidden state.

source

Replay buffer

ActionRNNs.CircularBuffer — Type

CircularBuffer Maintains a buffer of fixed size w/o reallocating and deallocating memory through a circular queue data struct.

source

ActionRNNs.StateBuffer — Type

StateBuffer(size::Int, state_size)

A cicular buffer for states. Typically used for images, can be used for state shapes up to 4d.

source

Base.length — Method

length(buffer)

Returns the current amount of data in the circular buffer. If the full flag is true then we return the size of the whole data frame.

source

Base.push! — Method

push!(buffer, data)

Adds data to the buffer, where data is an array of collections of types defined in CircularBuffer.datatypes returns row of data of added d

source

ActionRNNs.get_hs_details_for_er — Function

get_hs_details_for_er(model)

Return the types, sizes, and symbols of the hidden state for the ER buffer.

source

ActionRNNs.hs_symbol_layer — Function

hs_symbol_layer(l, idx)

Get symbol of current layer's hidden state layer.

source

ActionRNNs.get_hs_symbol_list — Function

get_hs_symbol_list(model)

Get list of hidden state symbols for all rnn layers.

source

ActionRNNs.get_state_from_experience — Function

get_state_from_experiment

Returns hidden state from experience sampled from an experience replay buffer. This assumes the replay has (:am1, :s, :a, :sp, :r, :t, :beg, hs_symbol...) as columns.

source

ActionRNNs.get_information_from_experience — Function

get_information_from_experience(agent, exp)

Gets the tuple of required details for the update of the agent. This is dispatched on the type of learning update. You can use the helper abstract classes, or dispatch for your specific update.

source

ActionRNNs.make_replay — Function

make_replay

source

ActionRNNs.get_hs_from_experience! — Function

get_hs_from_experience!(model, exp::NamedTuple, hs_dict::Dict, device)
get_hs_from_experience!(model, exp::Vector, hs_dict::Dict, device)

Get hs in the appropriate formate from the experience (either a Named Tuple or a vector of tuples Named Tuples).

source

ActionRNNs.capacity — Function

capacity(buffer)
returns the max number of elements the buffer can store.

source

Flux Chain Manipulation

ActionRNNs.contains_comp — Function

contains_comp(comp::Function, model)

Check if a layer of a model returns true with comp.

source

ActionRNNs.find_layers_with_eq — Function

find_layers_with_eq(eq::Function, model)

A function which takes a model and a function and returns the locations where the function returns true. This only supports composing chains twice.

source

ActionRNNs.find_layers_with_recur — Function

find_layers_with_recur(model)

Finds layers with recur. Uses find_layers_with_eq.

source

ActionRNNs.contains_rnn_type — Function

contains_rnn_type(m, rnn_type)

Checks if the model has a specific rnn type.

source

ActionRNNs.needs_action_input — Function

needs_action_input(m)

Checks if the model needs action input as a tuple.

source

ActionRNNs.contains_layer_type — Function

contains_layer_type(model, type)

Check if the model has a specific layer type.

source

Policies

ActionRNNs.ϵGreedy — Type

ϵGreedy(ϵ, action_set)
ϵGreedy(ϵ, num_actions)

Simple ϵGreedy value policy.

source

ActionRNNs.ϵGreedyDecay — Type

ϵGreedyDecay{AS}(ϵ_range, decay_period, warmup_steps, action_set::AS)
ϵGreedyDecay(ϵ_range, end_step, num_actions)

This is an acting policy which decays exploration linearly over time. This api will possibly change overtime once I figure out a better way to specify decaying epsilon.

Arguments

ϵ_range::Tuple{Float64, Float64}: (max epsilon, min epsilon) decay_period::Int: period epsilon decays warmup_steps::Int: number of steps before decay starts

source

ActionRNNs.get_prob — Function

get_prob(ap::ϵGreedy, values, action)

Get probabiliyt of action according to values.

source

ActionRNNs.sample — Function

sample(ap::ϵGreedy, values, rng)

Select an action according to the values.

source

Feature Constructors

Environments

RingWorld

ActionRNNs.RingWorld — Type

RingWorld States: 1 2 3 ... n Vis: 1 <-> 0 <-> 0 <-> ... <-> 0 <-| ^––––––––––––––-|

chain_length: size (diameter) of ring actions: Forward of Backward

source

LinkedChains

ActionRNNs.LinkedChainsV2 — Type

LinkedChains

termmode:

CONT: No termination
TERM: Terminate after chain

dynmode:

STRAIGHT: high Negative reward on wrong actions, but still progress through chain
JUMP: Jump to different chain on wrong action
STUCK: Don't progress on wrong action
JUMPSTUCK: Get "lost" with wrong actions, still being implemented.

source

TMaze

ActionRNNs.TMaze — Type

TMaze

TMaze as defined by Bram Bakker.

source

DirectionalTMaze

ActionRNNs.DirectionalTMaze — Type

DirectionalTMaze

Similar to ActionRNNs.TMaze but with a directional componenet overlayed ontop. This also changes to observation structure, where the agent must know what direction it is facing to get information about which goal is the good goal.

source

Masked Grid World

ActionRNNs.MaskedGridWorld — Type

MaskedGridWorld

This grid world gives observations on a random number of states which are aliased (or not given obsstrategy). This environment also has the pacmanwrapping flag which makes it so the edges wrap around.