For this project, I've used a Genetic Algorithm approach. The main idea is to treat a random expression as a phenotype. So, we have a set of expression created randomly and the simulation compute how much they "fit" to the searched sequence and let them evolve; the fitness is defined as the difference between the searched values and the values evaluated by the random expression.
A brief example will explain it better: if we want our genetic algorithm to find the numeric sequence of the squares of the integer, we can give to the algorithm the squares as the input (note that I wrote a set of five numbers, but it works as well with more or a bit less values):
1, 4, 9, 16, 25which are the values of the squares of the first five integers. Now, for every expression randomly created, the expression is evaluated for the first five integers; if the random expression we want to evaluate is:
`pi^sin(e*n)/e`
the result of the expression is computed for the values of `n` as the first integers and compare them to the expected results (the squares of the first integers):
`n` | `n^2` | `pi^sin(e*n)/e` | delta |
1 | 1 | 0.588742495509 | 0.411257504491 |
2 | 4 | 0.156069184749 | 3.84393081525 |
3 | 9 | 1.09780155011 | 7.90219844989 |
4 | 16 | 0.118107625996 | 15.881892374 |
5 | 25 | 0.978670530567 | 24.0213294694 |
The delta column is the difference between the expression evaluation and the expected result; as we can see the sample expression is not very near to the values of the searched sequence. But this difference is measurable, and this is the reason why this delta was chosen as the value of the fitness function for the finder.
Every expression is formed by elements which are in the recursive form of element, followed by an operator and finally another element, all surrounded by parenthesis:
`(2 + n)`
`(2 + (pi * sin(n))`
Selection
On a population of 300 expressions, only the best 75 elements (the ones that have the best values of fitness) of the round remain in the arena; all the other are newly created (with random values).
Crossover
The crossover operation takes a random element of an expression and changes it with a random element of another expression. So these two expression:
`(pi*(cos(e*pi)+n))`
`((n + (pi * (tan(4*pi))))+10)`
can become:
`(pi*(cos(pi * (tan(4*pi)))+n))`
`((n + (e*pi))+10)`
Mutation
For the mutation we take a random element of the expression and we change it with a newly created random one.
Conclusions
The web version of the sequence finder is running at http://pythonplayground-andreaiacono.rhcloud.com/finder. You can run the default sequence of the numbers multiplied by two, and you'll see that most of the times, the finder will give the correct solution; when it can't find anything correct in 30 secs, it returns the best expression (in terms of fitness) it has found till then.
The sources are available on github: https://github.com/andreaiacono/python-playground/tree/master/sequence_finder.