This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.
- Deep Q Learning (DQN) (Mnih et al. 2013)
- DQN with Fixed Q Targets (Mnih et al. 2013)
- Double DQN (Hado van Hasselt et al. 2015)
- Double DQN with Prioritised Experience Replay (Schaul et al. 2016)
- REINFORCE (Williams et al. 1992)
- DDPG (Lillicrap et al. 2016)
- TD3 (Fujimoto et al. 2018)
- PPO (Schulman et al. 2017)
- DQN with Hindsight Experience Replay (DQN-HER) (Andrychowicz et al. 2018)
- DDPG with Hindsight Experience Replay (DDPG-HER) (Andrychowicz et al. 2018)
- Hierarchical-DQN (h-DQN) (Kulkarni et al. 2016)
All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add A2C, A3C, Soft Actor-Critic and hierarchical RL algorithms soon.
- Bit Flipping Game (as described in Andrychowicz et al. 2018)
- Four Rooms Game (as described in Sutton et al. 1998)
- Long Corridor Game (as described in Kulkarni et al. 2016)
Below shows various RL algorithms successfully learning discrete action game Cart Pole
or continuous action game Mountain Car. The mean result from running the algorithms
with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters
used can be found in files Results/Cart_Pole.py and Results/Mountain_Car.py.
Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.
Below shows the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.
The repository's high-level structure is:
├── Agents
├── Actor_Critic_Agents
├── DQN_Agents
├── Policy_Gradient_Agents
└── Stochastic_Policy_Search_Agents
├── Environments
├── Results
└── Data_and_Graphs
├── Tests
├── Utilities
└── Data Structures
To watch all the different agents learn Cart Pole follow these steps:
git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations
conda create --name myenvname
y
conda activate myenvname
pip3 install -r requirements.txt
export PYTHONPATH="${PYTHONPATH}:/Deep_RL_Implementations"
python Results/Cart_Pole.py
For other games change the last line to one of the other files in the Results folder.
To use the algorithms with your own particular game instead you follow these steps:
-
Create an Environment class to represent your game - the environment class you create should extend the
Base_Environmentclass found in theEnvironmentsfolder to make it compatible with all the agents. -
Create a config object with the hyperparameters and game you want to use. See
Results/Cart_Pole.pyfor an example of this. -
Use class Trainer and function within it
run_games_for_agentsto have the different agents play the game. Again seeResults/Cart_Pole.pyfor an example of this.




