TMaze

Experience Replay Agent

TMazeERExperiment.working_experimentFunction
working_experiment

Creates a wrapper experiment where the main experiment is called with progress=true, testing=true and the config is the default_config with the addition of the keyword arguments.

source
TMazeERExperiment.default_configFunction

Automatically generated docs for TMazeERExperiment config.

Experiment details.

  • seed::Int: seed of RNG
  • steps::Int: Number of steps taken in the experiment

Environment details

This experiment uses the TMaze environment. The usable args are:

  • size::Int: Size of the hallway in tmaze.

agent details

RNN

The RNN used for this experiment and its total hidden size, as well as a flag to use (or not use) zhu's deep action network. See

  • cell::String: The typeof cell. Many types are possible.
  • deepaction::Bool: Whether to use Zhu et. al.'s deep action 4 RNNs idea. -internal_a::Int: the size of the action representation layer when deepaction=true
  • numhidden::Int: Size of hidden state in RNNs.

Optimizer details

Flux optimizers are used. See flux documentation and ExpUtils.Flux.get_optimizer for details.

  • opt::String: The name of the optimizer used
  • Parameters defined by the particular optimizer.

Learning update and replay details including:

  • Replay:

    • replay_size::Int: How many transitions are stored in the replay.
    • warm_up::Int: How many steps for warm-up (i.e. before learning begins).
  • Update details:

    • lupdate::String: Learning update name
    • gamma::Float: the discount for learning update.
    • batch_size::Int: size of batch
    • truncation::Int: Length of sequences used for training.
    • update_wait::Int: Time between updates (counted in agent interactions)
    • target_update_wait::Int: Time between target network updates (counted in agent interactions)
    • hs_strategy::String: Strategy for dealing w/ hidden state in buffer.

Default performance:

Time: 0:01:19
  episode:    6455
  successes:  0.9600399600399601
  loss:       0.990142
  l1:         0.000502145
  action:     2
  preds:      Float32[0.3016336, 3.6225605, -2.5592222, 1.884988]
  grad:       0.0
source
TMazeERExperiment.get_ann_sizeFunction
get_ann_size

Helper function which constructs the environment and agent using default config and kwargs then returns the number of parameters in the model.

source