The initial population is completely 'random'. It starts out by creating a string of random characters matching the length of the phrase.
The fun stuff happens after a generation has passed. The new generations are kind of like what happens in the real world, phrases are selected as parents, and then a little baby phrase is born.
This happens over and over, until a new population is filled, and then we start all over again.
Every genetic algorithm has a function that kind of says how good something is. The function in this algorithm looks at the characters of each phrase, and checks to see if it matches the phrase given by you, if it does, then the 'fitness' of that phrase is increased. After this is done for every phrase, we have a little selection process that selects a random phrase based on how 'fit' it is, which means, if we're looking for Hello and we have Heio in our population, it has a good chance of being selected, at which point it makes the baby phrase with another selected parent, following the same process.
In the algorithm we start out with a base population. Depending on the initial size, you can imagine that sometimes certain characters won't appear.
If we take an example, imagine we're trying to find the word Hello, and we generate a random population of 10 words. It is almost a certainty, that some of the characters of Hello
won't appear at all. To solve this issue, we introduce the concept of mutation, just like in the real world, to give a little bit of variation between generations, we mutate certain genes, or 'characters'.
In the algorithm, this basically means that we have a chance of a mutation occuring, resulting in a completely different character appearing in the place of one of the parents characters. Sadly, this doesn't always work out for the best,
and depending on the chance of the mutation occurring, the characters may mutate so often that we never get to the required result.
Yes, and no. A bigger population introduces more initial variety in the phrases, but the selection process can take longer. There is usually a point where the benefits of a higher population size drops off, and that point is usually determined by the size of your search space.
This was kinda covered in one of the above questions, but I'll go more in to depth. You could have some really good selections in a generation, with a high fitness, but if the mutation rate is high, then these good selections can quickly become terrible.
Usually the mutation rate should be quite low. It is important to have it, because of the possibility of certain genes not appearing in the population, but having it too high can almost make it impossible to find the phrase you want.
To keep it as simple as possible, the genes are just the letters of the alphabet, a - z, in lower case. Depending on the phrase you use, you may use some characters not present in the gene pool, so before the genetic algorithm runs, those genes get added in to
possible choices for the algorithm, just so we have a chance of getting our phrase.
So, once we have our parent phrases, we run a function using them. This function is similar to what happens in the real world, where we have a crossover of genes. Basically, we select a point within the parent phrase, where we want to cross over the genes, and
at that point we take some of the first parent, and some of the second parent, to create the new baby phrase. There are several ways this can happen, but this algorithm uses single-point crossover, where we take on average the first half of parent one, and the second half of parent two. Other methods
of crossover include uniform crossover, where we can take a gene here, and a gene there, but this actually made this particular genetic algorithm worse.How do you create new phrases?