By Daniel K., Age 14

   

     The NEAT neuroevolution algorithm is a more advanced method of machine learning. Rather than creating multiple organisms and attempting to create newer versions of them until one succeeds by chance (like in evolution), NEAT adds a reward value to desirable actions and attempts to emulate human learning by slightly altering the neural network to do more actions that result in a reward. Unlike evolution-based learning, NEAT only has to simulate one neural network at a time.

     To understand the way NEAT works, we first need to talk about neural networks. Neural networks are the basis of machine learning and try to simulate a simplified human brain. Neural networks consist of a single input layer and a single output layer of neurons. The input neurons are connected to the second layer of neurons by several connections with different weights (how much of the signal the connection transmits). The second layer connects with a third layer and so on until the network reaches the output neurons. The input neurons are activated depending on the input data (image HSV values, text codes, and so on). The neural network then processes the data using its hidden layers of neurons until all neuron values are processed. The values of the output neurons then get processed. If a neuron is semi-activated, the activation can be interpreted as the certainty of the neural network that its output is correct.

     The evolution method starts out with multiple neural networks and removes those that underperform, ‘breeding’ those that succeed. However, NEAT only includes one neural network and starts out with all connection weights activated. Whenever the AI performs a task that is seen as successful, it gets a reward or punishment represented by a number. A positive number represents a reward and a negative number represents a punishment. The magnitude of the number determines how good the reward is, kind of like a points system. Usually when humans or animals receive a reward, whatever it might be, they want to do more of the thing that led to them getting a reward. When they get punished, the opposite happens and the human or animal does less of the thing that led to them getting punished. We can simulate this using NEAT.

     

Whenever the AI gets a reward, we can ‘teach’ it to do more of the thing that gave it a reward by looking at the neurons that got stimulated the most, effectively singling all of the neurons that led to the AI getting the reward, and amplify the weight of the connections of the stimulated neurons. The increased weight or transmissivity of the neurons increases the chance that the AI will do tasks that give it a reward in the future. We can also do the opposite to simulate a punishment and decrease the weights of the neurons that caused the AI to get punished.

     Everything I just described happens live as soon as the AI gets rewarded or punished, which means that the AI can get better at doing things without having to restart it or the simulation. This also means that the AI can progressively get better at whatever it was made to do in production without having to pause to receive updates.