Bipedal Walker Evo
This project tries to solve OpenAI's bipedal walker using three different ways: Q-Learning, Mutation of Actions and Evolution Strategies.
Q-Learning
Coming soon
Action Mutation
Will get 0 reward, which is basically learning to prevent falling on it's head. The more actions the walker can use, the worse the reward.
This is because the walker tries to generate movement by trembling with it's legs. The covered distance doesn't cover the punishment for doing actions. So after 1600 moves the walker will get a reward around -60.
How it works
- Generate a population with a starting number randomized actions (we don't need enough actions to solve the problem right now)
- Let the population play the game reward every walker of the generation accordingly
- The best walker survives without mutating
- The better the reward the higher the chance to pass actions to next generation. Each child has a single parent, no crossover.
- Mutate all children and increment their number of actions
Hyperparameters
Parameter | Description | Interval | Our Choice |
---|---|---|---|
POP_SIZE |
Size of population. | [0;∞[ | 50 |
MUTATION_FACTOR |
Percentage of actions that will be mutated for each walker. | [0;1] | 0.2 |
BRAIN_SIZE |
Number of actions in the first generation. | [0;1600] | 50 |
INCREASE BY |
Incrementation of steps for each episode. | [0;∞[ | 5 |
Evolution Strategies
After 1000 episodes, which is about 1h of learning, it will reach ~250 reward.
Best score until now: 292/300 in 7000 episodes
How it works
- Generate a randomly weighted neural net
- Create a population of neural nets with mutated weights
- Let every net finish an episode and reward it accordingly
- The better the reward, the higher the chance to pass weights to next gen
Hyperparameters
Parameter | Description | Interval | Our Choice |
---|---|---|---|
HIDDEN_LAYER |
Size of hidden layer. | [1;∞[ | 12 |
BIAS |
Add a bias neuron to the input layer. | {0,1} | 0 |
POP_SIZE |
Size of population. | [0;∞[ | 50 |
MUTATION_FACTOR |
Percentage of weights that will be mutated for each mutant. | [0;1] | 0.1 |
LEARNING_RATE |
This is the rate of learning. | [0;1] | 0.03 |
GENS |
Number of generations. | [0;∞[ | 2000 |
MAX_STEPS |
Number of steps that are played in one episode. | [0; 1600] | 300 |
Installation
We use Windows, Anaconda and Python 3.7
conda create -n evo_neuro python=3.7
conda activate evo_neuro
conda install swig
pip install -r requirements.txt
Important Sources
Environment: https://github.com/openai/gym/wiki/BipedalWalker-v2
Table of all Environments: https://github.com/openai/gym/wiki/Table-of-environments
OpenAI Website: https://gym.openai.com/envs/BipedalWalker-v2/
More on evolution strategies: https://openai.com/blog/evolution-strategies/