How evolution taught us the “genetic algorithm”.
In this article, I am going to explain the concept of genetic algorithm. First, I am going to present its origin and its goal. Then I am going to show you how to implement a genetic algorithm with a short python tutorial.
So, the question is:
How to create a good Artificial Intelligence?
The naive solution is to create an “empirical algorithm” which is a set of rules: “if you meet this conditions, act like that”. I could imagine that with enough rules like this we could reproduce natural intelligence. But the amount of work is colossal and the final solution will never be able to best its creator. Isn’t it depressing to spend a lot of time creating something while knowing it can’t be perfect?
To avoid that, a new idea emerged. What if instead of creating a direct solution we recreate evolution. We could create the first prehistoric fishes, put them in conditions suitable to evolution and let them evolve freely toward man-kind or even further. This idea is called “genetic algorithm” and we are going to build one in the next part. So first let us refresh our memories and try to understand the natural selection theorized by Darwin.
This theory is simple: if a population want to thrive, it must improve by itself constantly, it’s the survival of the fittest. The best element of the population should inspire the offspring, but the other individuals must not be forgotten in order to maintain some diversity and be able to adapt in case of a variation of the natural environment.
Genetic algorithms are especially efficient with optimization problems.
Example: the Knapsack problem
The backpack optimization is a classical algorithm problem. You have two things: a backpack with a size (the weight it can hold) and a set of boxes with different weights and different values. The goal is to fill the bag to make it as valuable as possible without exceeding the maximum weight. It is a famous mathematical problem since 1972. The genetic algorithm is well suited to solve that because it’s an optimization problem with a lot of possible solutions.
Other algorithms aren’t efficient at all, see the Wikipedia page if you want to understand why.
In order to test by ourselves how a genetic algorithm works, we are going to use it to solve a simple problem: how could I crack my colleague’s password?
The algorithm is implemented on Python 3. You can download it here for Windows, or install it using
brew install python3 ,
sudo apt-get install python3 or
sudo yum install python3 . I advise you to run this code inside a Jupyter notebook.
Choosing a fitness function
The evaluation function is the first step to create a genetic algorithm. It’s the function that estimates the success of our specimen: it will allow us to divide the population between the ugly duckling and the beautiful swans. The goal of this separation is that, later, the successful specimen will have more “chance” to get picked to form the next generation. As simple as it appears, don’t be fooled, it’s the step of a genetic algorithm with the more intelligence related to the problem.
What is our goal? Crack a password. Thus the goal of our function is to transform the binary result “fail or success” in a continuous mark from 0 (can’t fail more) to 100 (perfection).
The simplest solution here is:
fitness score = (number of char correct) / (total number of char)
That way, an individual with a bigger fitness result is a specimen genetically closer to success than the others. Thus the fitness function for our genetic algorithm will accurately sort the population.
Creating our individuals
So now we know how to evaluate our individuals; but how do we define them? This part is really tricky: the goal is to know what are the unalterable characteristics and what is variable.
The comparison with genetics is here really helpful. Indeed, the DNA is composed of genes, and each of those genes comes through different alleles (different versions of this gene). Genetic algorithms retain this concept of population’s DNA.
In our case, our individuals are going to be words (obviously of equal length with the password). Each letter is a gene and the value of the letter is the allele. In the word “banana”: ‘b’ is the allele of the first letter.
What is the point of this creation?
- We know that each of our individuals is keeping the good shape (a word with the correct size)
- Our population can cover every possibility (every word possible with this size).
Out genetic algorithm can then explore all possible combinations.
Creating our first population
Now, we know what are the characteristics of our individuals and how we can evaluate their performance. We can now start the “evolution” step of our genetic algorithm.
The main idea to keep in mind when we create the first population is that we must not point the population towards a solution that seems good. We must make the population as wide as possible and make it cover as many possibilities as possible. The perfect first population of a genetic algorithm should cover every existing allele.
So in our case, we are just going to create words only composed of random letters.
From one generation to the next
Given a generation, in order to create the next one, we have 2 things to do. First we select a specific part of our current generation. Then the genetic algorithm combines those breeders in order to create the next batch.
They are lots of way to do this but you must keep in mind two ideas: the goals are to select the best solutions of the previous generation and not to completely put aside the others. The hazard is: if you select only the good solutions at the beginning of the genetic algorithm you are going to converge really quickly towards a local minimum and not towards the best solution possible.
My solution to do that is to select on the one hand the N better specimen (in my code, N = best_sample) and on the other hand to select M random individuals without distinction of fitness (M = lucky_few).
We can keep on the biologic analogy for the breeding in our genetic algorithm. The goal of sexual reproduction is to mix the DNA of 2 individuals, so let’s do the same thing here. We have two individuals: “Tom” and “Jerry”, their DNA is defined by their alleles (the value of each letter). Thus in order to mix their DNA, we just have to mix their letters. There are lots of ways to do this so I picked the simplest solution: for each letter of the child, take randomly the letter of “Tom” or “Jerry”.
NB: Obviously, the couple “Tom" and “Jerry” isn’t going to produce only one child. You have to fix the number of children per couple in order to keep a stable population in your genetic algorithm. The number of individuals in the generation 0 equals the number of individuals in the next generation.
This last step of our genetic algorithm is the natural mutation of an individual. After the breeding, each individual must have a small probability to see their DNA change a little bit. The goal of this operation is to prevent the algorithm to be blocked in a local minimum.
Now you have all the tools to build your own genetic algorithm. Feel free to modify my own implementation. For each step, a lot of solutions are possible, take the fittest to solve your problem.
Wanna go further?
Here is a list of other supports to train, understand and even compete on the field of AI and genetic algorithm.
BoxCar is an online example of a genetic algorithm. The goal is to create the most efficient two . wheels vehicles. Check the result here
With this application, you are more solicited than in the previous. You have to create a “creature” with joints, bones, and muscles. Then, the genetic algorithm tries to optimize the moves of your creature in order to execute a task: jump, run, climb stairs and others.
Check the app here.
Do it yourself
Finally, last but not least: Coding Game. It’s a platform linking lots of dev around a common passion “AI”. Every month there is a one-week long contest where you must create the best artificial intelligence possible. The winner is nearly always using the genetic algorithm. So if you feel ready to go to the big leagues, take a deep breath and jump in here.
Thanks to Flavian Hautbois and Tristan Roussel.