From dd6e4b741fd9b62c72f3b8efa5244ac4dab1b5b8 Mon Sep 17 00:00:00 2001
From: Philip Maas <philip.maas@stud.hs-bochum.de>
Date: Mon, 28 Feb 2022 09:34:00 +0000
Subject: [PATCH] Update README.md

---
 README.md | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 762ee9c..2f7bd37 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,33 @@
 # Bipedal Walker Evo
 
-This project tries to solve OpenAI's bipedal walker with an evolutionary strategy.\
+This project tries to solve OpenAI's bipedal walker using three different ways: Q-Learning, Mutation of Actions and Evolution Strategies.\
+
+# Q-Learning
+Coming soon
+
+# Action Mutation
+Will get 0 reward, which is basically learning to prevent falling on it's head.
+
+## How it works
+1. Generate a population with a starting number randomized actions (we don't need enough actions to solve the problem right now)
+2. Let the population play the game reward every walker of the generation accordingly
+3. The best walker survives without mutating
+3. The better the reward the higher the chance to pass actions to next generation. Each child has a single parent, no crossover.
+4. Mutate all children and increment their number of actions
+
+## Hyperparameters
+| Parameter         | Description                                                 | Interval  |
+|-------------------|-------------------------------------------------------------|-----------|
+| `POP_SIZE`        | Size of population.                                         | [0;∞[     |
+| `MUTATION_FACTOR` | Percentage of weights that will be mutated for each mutant. | [0;1]     |
+| `ACTIONS_START`   | Number of actions in the first generation.                  | [0;1600]  |
+| `INCREASE BY`     | Incrementation of steps for each episode.                   | [0;∞[     |
+| `MAX_STEPS`       | Number of steps that are played in one episode.             | [0; 1600] |
+
+# Evolution Strategies
 After 1000 episodes, which is about 1h of learning, it will reach ~250 reward.\
-Best score until now: 292/300
+Best score until now: 292/300 in 7000 episodes \
+![Reward](/repository/EvolutionStrategies/Experiments/12 1 50 0.1 decaying 300/12_2_50_0.1_decaying_300.png "Employee Data title")
 
 ## How it works
 1. Generate a randomly weighted neural net
@@ -22,7 +47,7 @@ Best score until now: 292/300
 | `MAX_STEPS`       | Number of steps that are played in one episode.             | [0; 1600] |
 
 
-## Installation
+# Installation
 We use Windows, Anaconda and Python 3.7 \
 `conda create -n evo_neuro python=3.7` \
 `conda activate evo_neuro`\
@@ -31,7 +56,7 @@ We use Windows, Anaconda and Python 3.7 \
 
 
 
-## Sources
+# Important Sources
 Environment: https://github.com/openai/gym/wiki/BipedalWalker-v2 \
 Table of all Environments: https://github.com/openai/gym/wiki/Table-of-environments
 OpenAI Website: https://gym.openai.com/envs/BipedalWalker-v2/ \
-- 
GitLab