diff --git a/README.md b/README.md index 96ac156145a2b25f7f78f11d74d5b95c04b2e24b..bfe13a96ddf8060b8ddc21416e8618e91d681ff8 100644 --- a/README.md +++ b/README.md @@ -62,17 +62,20 @@ After 1000 episodes, which is about 1h of learning, it will reach ~250 reward.\ ✅ Best score until now: 304/300 in under 7000 episodes with a decaying learning rate and mutation factor. \ \ Learning curve:\ - + \ \ Rewards of fully learned agent in 50 episodes:\ - + + ## How it works 1. Generate a randomly weighted neural net 2. Create a population of neural nets with mutated weights 3. Let every net finish an episode and reward it accordingly 4. The better the reward, the higher the chance to pass weights to next gen +Also: Decay alpha and sigma to 0.05 after 1000 gens and 0.01 after 5000 gens for a more precise learning after passing the local extrmum that is standing around. + ## Hyperparameters | Parameter | Description | Interval | Our Choice | |-------------------|-------------------------------------------------------------|-----------|------------| @@ -80,7 +83,7 @@ Rewards of fully learned agent in 50 episodes:\ | `BIAS` | Add a bias neuron to the input layer. | {0,1} | 0 | | `POP_SIZE` | Size of population. | [0;∞[ | 50 | | `MUTATION_FACTOR` | Percentage of weights that will be mutated for each mutant. | [0;1] | 0.1 | -| `LEARNING_RATE` | This is the rate of learning. | [0;1] | 0.03 | +| `LEARNING_RATE` | This is the rate of learning. | [0;1] | 0.1 | | `GENS` | Number of generations. | [0;∞[ | 2000 | | `MAX_STEPS` | Number of steps that are played in one episode. | [0; 1600] | 300 |