Genetic Phrase

by Daniel Pottenger

Back

Where does the population come from?

What makes a parent phrase?

Why do random characters keep appearing?

Is a bigger population better?

How about a higher mutation rate?

How do your genes work?

How do you create new phrases?

Where does the population come from?

The initial population is completely 'random'. It starts out by creating a string of random characters matching the length of the phrase.
The fun stuff happens after a generation has passed. The new generations are kind of like what happens in the real world, phrases are selected as parents, and then a little baby phrase is born. This happens over and over, until a new population is filled, and then we start all over again.

What makes a parent phrase?

Every genetic algorithm has a function that kind of says how good something is. The function in this algorithm looks at the characters of each phrase, and checks to see if it matches the phrase given by you, if it does, then the 'fitness' of that phrase is increased. After this is done for every phrase, we have a little selection process that selects a random phrase based on how 'fit' it is, which means, if we're looking for Hello and we have Heio in our population, it has a good chance of being selected, at which point it makes the baby phrase with another selected parent, following the same process.

Why do random characters keep appearing?

In the algorithm we start out with a base population. Depending on the initial size, you can imagine that sometimes certain characters won't appear.
If we take an example, imagine we're trying to find the word Hello, and we generate a random population of 10 words. It is almost a certainty, that some of the characters of Hello won't appear at all. To solve this issue, we introduce the concept of mutation, just like in the real world, to give a little bit of variation between generations, we mutate certain genes, or 'characters'.
In the algorithm, this basically means that we have a chance of a mutation occuring, resulting in a completely different character appearing in the place of one of the parents characters. Sadly, this doesn't always work out for the best, and depending on the chance of the mutation occurring, the characters may mutate so often that we never get to the required result.

Is a bigger population better?

Yes, and no. A bigger population introduces more initial variety in the phrases, but the selection process can take longer. There is usually a point where the benefits of a higher population size drops off, and that point is usually determined by the size of your search space.

How about a higher mutation rate?

This was kinda covered in one of the above questions, but I'll go more in to depth. You could have some really good selections in a generation, with a high fitness, but if the mutation rate is high, then these good selections can quickly become terrible.
Usually the mutation rate should be quite low. It is important to have it, because of the possibility of certain genes not appearing in the population, but having it too high can almost make it impossible to find the phrase you want.

How do your genes work?

To keep it as simple as possible, the genes are just the letters of the alphabet, a - z, in lower case. Depending on the phrase you use, you may use some characters not present in the gene pool, so before the genetic algorithm runs, those genes get added in to possible choices for the algorithm, just so we have a chance of getting our phrase.

How do you create new phrases?

So, once we have our parent phrases, we run a function using them. This function is similar to what happens in the real world, where we have a crossover of genes. Basically, we select a point within the parent phrase, where we want to cross over the genes, and at that point we take some of the first parent, and some of the second parent, to create the new baby phrase. There are several ways this can happen, but this algorithm uses single-point crossover, where we take on average the first half of parent one, and the second half of parent two. Other methods of crossover include uniform crossover, where we can take a gene here, and a gene there, but this actually made this particular genetic algorithm worse.