Skip to content

robot0102/Deep-Reinforcement-Learning-Algorithms-with-PyTorch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Reinforcement Learning Algorithms with PyTorch

Travis CI

RL PyTorch

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

Algorithms Implemented

  1. Deep Q Learning (DQN) (Mnih et al. 2013)
  2. DQN with Fixed Q Targets (Mnih et al. 2013)
  3. Double DQN (Hado van Hasselt et al. 2015)
  4. Double DQN with Prioritised Experience Replay (Schaul et al. 2016)
  5. REINFORCE (Williams et al. 1992)
  6. DDPG (Lillicrap et al. 2016)
  7. TD3 (Fujimoto et al. 2018)
  8. PPO (Schulman et al. 2017)
  9. DQN with Hindsight Experience Replay (DQN-HER) (Andrychowicz et al. 2018)
  10. DDPG with Hindsight Experience Replay (DDPG-HER) (Andrychowicz et al. 2018)
  11. Hierarchical-DQN (h-DQN) (Kulkarni et al. 2016)

All implementations are able to quickly solve Cart Pole (discrete actions), Mountain Car Continuous (continuous actions), Bit Flipping (discrete actions with dynamic goals) or Fetch Reach (continuous actions with dynamic goals). I plan to add A2C, A3C, Soft Actor-Critic and hierarchical RL algorithms soon.

Environments Implemented

  1. Bit Flipping Game (as described in Andrychowicz et al. 2018)
  2. Four Rooms Game (as described in Sutton et al. 1998)
  3. Long Corridor Game (as described in Kulkarni et al. 2016)

Results

1. Cart Pole and Mountain Car

Below shows various RL algorithms successfully learning discrete action game Cart Pole or continuous action game Mountain Car. The mean result from running the algorithms with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters used can be found in files Results/Cart_Pole.py and Results/Mountain_Car.py.

Cart Pole and Mountain Car Results

2. Hindsight Experience Replay (HER) Experiements

Below shows the performance of DQN and DDPG with and without Hindsight Experience Replay (HER) in the Bit Flipping (14 bits) and Fetch Reach environments described in the papers Hindsight Experience Replay 2018 and Multi-Goal Reinforcement Learning 2018. The results replicate the results found in the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference between them was whether hindsight was used or not.

HER Experiment Results

3. Hierarchical Reinforcement Learning Experiments

Below shows the performance of DQN and the algorithm hierarchical-DQN from Kulkarni et al. 2016 on the Long Corridor environment also explained in Kulkarni et al. 2016. The environment requires the agent to go to the end of a corridor before coming back in order to receive a larger reward. This delayed gratification and the aliasing of states makes it a somewhat impossible game for DQN to learn but if we introduce a meta-controller (as in h-DQN) which directs a lower-level controller how to behave we are able to make more progress. This aligns with the results found in the paper.

h-DQN Long Corridor

Usage

The repository's high-level structure is:

├── Agents                    
    ├── Actor_Critic_Agents   
    ├── DQN_Agents         
    ├── Policy_Gradient_Agents
    └── Stochastic_Policy_Search_Agents 
├── Environments   
├── Results             
    └── Data_and_Graphs        
├── Tests
├── Utilities             
    └── Data Structures            

i) To Watch the Agents Learn the Above Games

To watch all the different agents learn Cart Pole follow these steps:

git clone https://github.com/p-christ/Deep_RL_Implementations.git
cd Deep_RL_Implementations

conda create --name myenvname
y
conda activate myenvname

pip3 install -r requirements.txt
export PYTHONPATH="${PYTHONPATH}:/Deep_RL_Implementations"

python Results/Cart_Pole.py

For other games change the last line to one of the other files in the Results folder.

ii) To Train the Agents on your Own Game

To use the algorithms with your own particular game instead you follow these steps:

  1. Create an Environment class to represent your game - the environment class you create should extend the Base_Environment class found in the Environments folder to make it compatible with all the agents.

  2. Create a config object with the hyperparameters and game you want to use. See Results/Cart_Pole.py for an example of this.

  3. Use class Trainer and function within it run_games_for_agents to have the different agents play the game. Again see Results/Cart_Pole.py for an example of this.

About

PyTorch implementations of deep reinforcement learning algorithms and environments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%