README.md



Bipedal Walker Evo
This project tries to solve OpenAI's bipedal walker using three different ways: Q-Learning, Mutation of Actions and Evolution Strategies.

Q-Learning
Coming soon

Action Mutation
Will get 0 reward, which is basically learning to prevent falling on it's head. The more actions the walker can use, the worse the reward.
This is because the walker tries to generate movement by trembling with it's legs. The covered distance doesn't cover the punishment for doing actions. So after 1600 moves the walker will get a reward around -60.


How it works

Generate a population with a starting number randomized actions (we don't need enough actions to solve the problem right now)
Let the population play the game reward every walker of the generation accordingly
The best walker survives without mutating
The better the reward the higher the chance to pass actions to next generation. Each child has a single parent, no crossover.
Mutate all children and increment their number of actions


Hyperparameters


Parameter
Description
Interval
Our Choice


POP_SIZE
Size of population.
[0;∞[
50


MUTATION_FACTOR
Percentage of actions that will be mutated for each walker.
[0;1]
0.2


BRAIN_SIZE
Number of actions in the first generation.
[0;1600]
50


INCREASE BY
Incrementation of steps for each episode.
[0;∞[
5