Lunar Lander
LunarLanderExperiment — ModuleLunarLanderExperimentExperiment module for running experiments in Lunar Lander.
LunarLanderExperiment.main_experiment — Functionmain_experimentRun an experiment from config. See LunarLanderExperiment.working_experiment  for details on running on the command line and LunarLanderExperiment.default_config  for info about the default configuration.
LunarLanderExperiment.working_experiment — Functionworking_experimentCreates a wrapper experiment where the main experiment is called with progress=true, testing=true and the config is the default_config with the addition of the keyword arguments.
LunarLanderExperiment.default_config — FunctionAutomatically generated docs for LunarLanderExperiment config.
Experiment details.
- seed::Int: seed of RNG
- steps::Int: Number of steps taken in the experiment
Logging Extras
By default the experiment will log and save (depending on the synopsis flag) the logging group :EXP.  You can add extra logging groups and [group, name] pairs using the below arguments. Everything  added to save_extras will be passed to the save operation, and will be logged automatically. The  groups and names added to log_extras will be ommited from save_results but still passed back to the user through the data dict.
- <log_extras::Vector{Union{String, Vector{String}}>: which group and <name> to log to the data dict. This will not be passed to save.
- <save_extras::Vector{Union{String, Vector{String}}>: which groups and <names> to log to the data dict. This will be passed to save.
Environment details
This experiment uses the DirectionalTMaze environment. The usable args are:
- size::Int: Size of the hallway in directional tmaze.
agent details
RNN
The RNN used for this experiment and its total hidden size, as well as a flag to use (or not use) zhu's deep action network. See
- cell::String: The typeof cell. Many types are possible.
- deepaction::Bool: Whether to use Zhu et. al.'s deep action 4 RNNs idea. -- internal_a::Int: the size of the action representation layer when- deepaction=true
- numhidden::Int: Size of hidden state in RNNs.
Optimizer details
Flux optimizers are used. See flux documentation and ExpUtils.Flux.get_optimizer for details.
- opt::String: The name of the optimizer used
- Parameters defined by the particular optimizer.
Learning update and replay details including:
- Replay: - replay_size::Int: How many transitions are stored in the replay.
- warm_up::Int: How many steps for warm-up (i.e. before learning begins).
 
- Update details: - lupdate::String: Learning update name
- gamma::Float: the discount for learning update.
- batch_size::Int: size of batch
- truncation::Int: Length of sequences used for training.
- update_wait::Int: Time between updates (counted in agent interactions)
- target_update_wait::Int: Time between target network updates (counted in agent interactions)
- hs_strategy::String: Strategy for dealing w/ hidden state in buffer.
 
Default Performance
Time: 0:01:08
  episode:     100
  total_rews:  -249.31464
  loss:        146.8288
  l1:          0.6579351
  action:      4
  preds:       Float32[-18.375952, -17.984146, -17.7312, -17.009594]LunarLanderExperiment.get_ann_size — Functionget_ann_sizeHelper function which constructs the environment and agent using default config and kwargs then returns the number of parameters in the model.
LunarLanderExperiment.construct_agent — Functionconstruct_agentConstruct the agent for lunar lander.
LunarLanderExperiment.construct_env — Functionconstruct_environment