TMaze
Experience Replay Agent
TMazeERExperiment — ModuleTMazeERExperimentThe experimental module for the TMaze experiments.
TMazeERExperiment.main_experiment — Functionmain_experimentThis is the main experiment function for TMaze ER Agents. See TMazeERExperiment.working_experiment for details on running on the command line and TMazeERExperiment.default_config for info about the default configuration.
TMazeERExperiment.working_experiment — Functionworking_experimentCreates a wrapper experiment where the main experiment is called with progress=true, testing=true and the config is the default_config with the addition of the keyword arguments.
TMazeERExperiment.default_config — FunctionAutomatically generated docs for TMazeERExperiment config.
Experiment details.
seed::Int: seed of RNGsteps::Int: Number of steps taken in the experiment
Environment details
This experiment uses the TMaze environment. The usable args are:
size::Int: Size of the hallway in tmaze.
agent details
RNN
The RNN used for this experiment and its total hidden size, as well as a flag to use (or not use) zhu's deep action network. See
cell::String: The typeof cell. Many types are possible.deepaction::Bool: Whether to use Zhu et. al.'s deep action 4 RNNs idea. -internal_a::Int: the size of the action representation layer whendeepaction=truenumhidden::Int: Size of hidden state in RNNs.
Optimizer details
Flux optimizers are used. See flux documentation and ExpUtils.Flux.get_optimizer for details.
opt::String: The name of the optimizer used- Parameters defined by the particular optimizer.
Learning update and replay details including:
Replay:
replay_size::Int: How many transitions are stored in the replay.warm_up::Int: How many steps for warm-up (i.e. before learning begins).
Update details:
lupdate::String: Learning update namegamma::Float: the discount for learning update.batch_size::Int: size of batchtruncation::Int: Length of sequences used for training.update_wait::Int: Time between updates (counted in agent interactions)target_update_wait::Int: Time between target network updates (counted in agent interactions)hs_strategy::String: Strategy for dealing w/ hidden state in buffer.
Default performance:
Time: 0:01:19
episode: 6455
successes: 0.9600399600399601
loss: 0.990142
l1: 0.000502145
action: 2
preds: Float32[0.3016336, 3.6225605, -2.5592222, 1.884988]
grad: 0.0TMazeERExperiment.get_ann_size — Functionget_ann_sizeHelper function which constructs the environment and agent using default config and kwargs then returns the number of parameters in the model.
TMazeERExperiment.construct_agent — Functionconstruct_agentConstruct the agent for TMazeERExperiment.
TMazeERExperiment.construct_env — Functionconstruct_envConstruct direction tmaze using:
size::Intsize of hallway.