Training an Agent
This is the most common case. Just choose a preset using the
-p flag and press enter.
python coach.py -p CartPole_DQN
Multi-threaded algorithms are very common this days.
They typically achieve the best results, and scale gracefully with the number of threads.
In Coach, running such algorithms is done by selecting a suitable preset, and choosing the number of threads to run using the
python coach.py -p CartPole_A3C -n 8
Evaluating an Agent
There are several options for evaluating an agent during the training:
For multi-threaded runs, an evaluation agent will constantly run in the background and evaluate the model during the training.
For single-threaded runs, it is possible to define an evaluation period through the preset. This will run several episodes of evaluation once in a while.
Additionally, it is possible to save checkpoints of the agents networks and then run only in evaluation mode.
Saving checkpoints can be done by specifying the number of seconds between storing checkpoints using the
The checkpoints will be saved into the experiment directory.
Loading a model for evaluation can be done by specifying the
-crd flag with the experiment directory, and the
--evaluate flag to disable training.
python coach.py -p CartPole_DQN -s 60
python coach.py -p CartPole_DQN --evaluate -crd CHECKPOINT_RESTORE_DIR
Playing with the Environment as a Human
Interacting with the environment as a human can be useful for understanding its difficulties and for collecting data for imitation learning.
In Coach, this can be easily done by selecting a preset that defines the environment to use, and specifying the
When the environment is loaded, the available keyboard buttons will be printed to the screen.
Pressing the escape key when finished will end the simulation and store the replay buffer in the experiment dir.
python coach.py -p Breakout_DQN --play
Learning Through Imitation Learning
Learning through imitation of human behavior is a nice way to speedup the learning. In Coach, this can be done in two steps -
Create a dataset of demonstrations by playing with the environment as a human. After this step, a pickle of the replay buffer containing your game play will be stored in the experiment directory. The path to this replay buffer will be printed to the screen. To do so, you should select an environment type and level through the command line, and specify the
python coach.py -et Doom -lvl Basic --play
Next, use an imitation learning preset and set the replay buffer path accordingly. The path can be set either from the command line or from the preset itself.
python coach.py -p Doom_Basic_BC -cp='agent.load_memory_from_file_path=\"<experiment dir>/replay_buffer.p\"'
Rendering the Environment
Rendering the environment can be done by using the
When working with multi-threaded algorithms, the rendered image will be representing the game play of the evaluation worker.
When working with single-threaded algorithms, the rendered image will be representing the single worker which can be either training or evaluating.
Keep in mind that rendering the environment in single-threaded algorithms may slow the training to some extent.
When playing with the environment using the
--play flag, the environment will be rendered automatically without the need for specifying the
python coach.py -p Breakout_DQN -r
Coach allows storing GIFs of the agent game play.
To dump GIF files, use the
The files are dumped after every evaluation episode, and are saved into the experiment directory, under a gifs sub-directory.
python coach.py -p Breakout_A3C -n 4 -dg
Switching between deep learning frameworks
Coach uses TensorFlow as its main backend framework, but it also supports neon for some of the algorithms.
By default, TensorFlow will be used. It is possible to switch to neon using the
python coach.py -p Doom_Basic_DQN -f neon
There are several convenient flags which are important to know about.
Here we will list most of the flags, but these can be updated from time to time.
The most up to date description can be found by using the
||string||Name of a preset to run (as configured in presets.py)|
||flag||List all available presets|
||string||Experiment name to be used to store the results.|
||string||Neural network framework. Available values: tensorflow, neon|
||int||Number of workers for multi-process based agents, e.g. A3C|
||flag||Play as a human by controlling the game with the keyboard. This option will save a replay buffer with the game play.|
||flag||Run evaluation only. This is a convenient way to disable training in order to evaluate an existing checkpoint.|
||flag||Don't suppress TensorFlow debug prints.|
||int||Time in seconds between saving checkpoints of the model.|
||string||Path to a folder containing a checkpoint to restore the model from.|
||flag||Enable the gif saving functionality.|
||string||Choose an agent type class to override on top of the selected preset. If no preset is defined, a preset can be set from the command-line by combining settings which are set by using
||string||Choose an environment type class to override on top of the selected preset. If no preset is defined, a preset can be set from the command-line by combining settings which are set by using
||string||Choose an exploration policy type class to override on top of the selected preset.If no preset is defined, a preset can be set from the command-line by combining settings which are set by using
||string||Choose the level that will be played in the environment that was selected. This value will override the level parameter in the environment class.|
||string||Semicolon separated parameters used to override specific parameters on top of the selected preset (or on top of the command-line assembled one). Whenever a parameter value is a string, it should be inputted as