2. Create a population of neural nets with mutated weights
3. Let every net finish an episode and reward it accordingly
4. The better the reward, the higher the chance to pass weights to next gen
Also: Decay alpha and sigma to 0.05 after 1000 gens and 0.01 after 5000 gens for a more precise learning after passing the local extrmum that is standing around.