From dd6e4b741fd9b62c72f3b8efa5244ac4dab1b5b8 Mon Sep 17 00:00:00 2001 From: Philip Maas <philip.maas@stud.hs-bochum.de> Date: Mon, 28 Feb 2022 09:34:00 +0000 Subject: [PATCH] Update README.md --- README.md | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 762ee9c..2f7bd37 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,33 @@ # Bipedal Walker Evo -This project tries to solve OpenAI's bipedal walker with an evolutionary strategy.\ +This project tries to solve OpenAI's bipedal walker using three different ways: Q-Learning, Mutation of Actions and Evolution Strategies.\ + +# Q-Learning +Coming soon + +# Action Mutation +Will get 0 reward, which is basically learning to prevent falling on it's head. + +## How it works +1. Generate a population with a starting number randomized actions (we don't need enough actions to solve the problem right now) +2. Let the population play the game reward every walker of the generation accordingly +3. The best walker survives without mutating +3. The better the reward the higher the chance to pass actions to next generation. Each child has a single parent, no crossover. +4. Mutate all children and increment their number of actions + +## Hyperparameters +| Parameter | Description | Interval | +|-------------------|-------------------------------------------------------------|-----------| +| `POP_SIZE` | Size of population. | [0;∞[ | +| `MUTATION_FACTOR` | Percentage of weights that will be mutated for each mutant. | [0;1] | +| `ACTIONS_START` | Number of actions in the first generation. | [0;1600] | +| `INCREASE BY` | Incrementation of steps for each episode. | [0;∞[ | +| `MAX_STEPS` | Number of steps that are played in one episode. | [0; 1600] | + +# Evolution Strategies After 1000 episodes, which is about 1h of learning, it will reach ~250 reward.\ -Best score until now: 292/300 +Best score until now: 292/300 in 7000 episodes \ + ## How it works 1. Generate a randomly weighted neural net @@ -22,7 +47,7 @@ Best score until now: 292/300 | `MAX_STEPS` | Number of steps that are played in one episode. | [0; 1600] | -## Installation +# Installation We use Windows, Anaconda and Python 3.7 \ `conda create -n evo_neuro python=3.7` \ `conda activate evo_neuro`\ @@ -31,7 +56,7 @@ We use Windows, Anaconda and Python 3.7 \ -## Sources +# Important Sources Environment: https://github.com/openai/gym/wiki/BipedalWalker-v2 \ Table of all Environments: https://github.com/openai/gym/wiki/Table-of-environments OpenAI Website: https://gym.openai.com/envs/BipedalWalker-v2/ \ -- GitLab