Version 13 (modified by gkronber, 7 years ago) (diff)

--

# Genetic Programming Problem Definition Language

Gabriel Kronberger, last update 8th of July, 2013

The aim of GPDL is to make it easier to use GP-systems. Right now, it is very cumbersome to implement new problems in GP-systems. Several factors limit the usefulness of existing GP implementations:

• a lot of boiler-plate code has to be written to integrate into the GP framework.
• it is necessary to learn the API of the GP system
• problem implementations cannot be re-used to try different GP systems (e.g. ECJ, HeuristicLab, ...)

GPDL separates the implementation of problem details from the intricacies of algorithm implementations. Only the details of the problem are specified in a framework-independent way. A compiler can transform the problem description to source code for different GP systems. This way it will be much easier to implement problems and try to solve them with different GP implementations or even other kinds of solvers!

GPDL is not limited to C# or Java and it is not limited to HeuristicLab (however, it is the only system we implemented a compiler for so far).

GPDL is based on the concept of attributed grammars with semantic actions that are usually used in compiler construction. Basically, GP can be described as search over a space of sentences of a formal language to find the sentence with optimal objective function value. Therefore, a GP problem can be defined as a tuple of a formal language, defined e.g. via a grammar, and an objective function for sentences. Therefore, our idea is to specify GP problems via an attributed grammar with semantic actions for the interpretation of sentences and an objective function to be minimized / maximized. Below you will find an example how the symbolic regression model can be in GPDL.

On this page we provide a first specification of the GPDL language and a reference implementation for a GPDL compiler for HeuristicLab.

## Motivating Example: Symbolic Regression

This is a fully self-contained specification of a symbolic regression problem to solve the Poly-10 benchmark problem. This file can be compiler using our reference GPDL compiler for HeuristicLab to create a solver for this problem.

PROBLEM SymbRegKoza
CODE <<
double[,] x;
double[] y;
string[] variableNames;
Dictionary<string,int> nameToCol;

double GetValue(double[,] data, string varName, int row) {
if(nameToCol == null) {
/* init mapping */
nameToCol = new Dictionary<string, int>();
for(int i=0; i<variableNames.Length; i++) {
nameToCol[variableNames[i]] = i;
}
}
return x[row, nameToCol[varName]];
}

double RSquared(IEnumerable<double> xs, IEnumerable<double> ys) {
HeuristicLab.Problems.DataAnalysis.OnlineCalculatorError error;
var r2 = HeuristicLab.Problems.DataAnalysis.OnlinePearsonsRSquaredCalculator.Calculate(xs, ys, out error);
if(error == HeuristicLab.Problems.DataAnalysis.OnlineCalculatorError.None) return r2;
else return 0.0;
}
>>

INIT <<
// generate 500 case of poly-10 benchmark function
int n = 500;
variableNames = new string[] {"x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10" };
var rand = new System.Random();
x = new double[n, 10];
y = new double[n];
for(int row = 0; row < 500; row++) {
for(int col = 0; col < 10; col++) {
x[row, col] = rand.NextDouble() * 2.0 - 1.0;
}
y[row] = x[row, 0] * x[row, 1] +
x[row, 2] * x[row, 3] +
x[row, 4] * x[row, 5] +
x[row, 0] * x[row, 6] + x[row, 8] +
x[row, 2] * x[row, 5] + x[row, 9];
}
>>

/* non-terminals of the problem */
NONTERMINALS
Model<<int row, out double val>>.
RPB<<int row, out double val>>.
Subtraction<<int row, out double val>>.
Multiplication<<int row, out double val>>.
Division<<int row, out double val>>.

/* terminals of the problem: random constants (ERC) and variables */
TERMINALS
ERC<<out double val>>
CONSTRAINTS
val IN RANGE <<-100>> .. <<100>>
.

Var<<out string varName>>
CONSTRAINTS
varName IN SET <<variableNames>>
.

/* grammar rules for the problem with interleaved semantic actions */
RULES
Model<<int row, out double val>> =
RPB<<row, out val>> .

RPB<<int row, out double val>> =                         LOCAL << string varName; >>
| Subtraction<<row, out val>>
| Division<<row, out val>>
| Multiplication<<row, out val>>
| Var<<out varName>>                                   SEM << val = GetValue(x, varName, row); >>
| ERC<<out val>>
.

Addition<<int row, out double val>> =                    LOCAL << double x1, x2; >>
RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 + x2; >>
.
Subtraction<<int row, out double val>> =                 LOCAL << double x1, x2; >>
RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 - x2; >>
.
Division<<int row, out double val>> =                    LOCAL << double x1, x2; >>
RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 / x2; >>
.
Multiplication<<int row, out double val>> =              LOCAL << double x1, x2; >>
RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 * x2; >>
.

/* objective function */
MAXIMIZE                                                   /* could also use the keyword MINIMIZE here */
<<
var rows = System.Linq.Enumerable.Range(0, x.GetLength(0));
var predicted = rows.Select(r => {
double result;
Model(r, out result);                                /* we can call the root symbol directly */
return result;
});
return RSquared(predicted, y);
>>
END SymbRegKoza.