= Genetic Programming Problem Definition Language (GPDL) =
[[http://heal.heuristiclab.com/team/kronberger | Gabriel Kronberger]], last update 19th of July, 2013
gkronber@heuristiclab.com
The aim of GPDL is to make it easier to use
GP-systems. Currently, it is very cumbersome to implement new
problems in GP-systems, because of several factors including
among others:
* a lot of boiler-plate code has to be written to integrate into
the GP framework,
* it is necessary to learn APIs of different GP system,
* problem implementations cannot be re-used to try different GP
systems (e.g. ECJ, HeuristicLab, GEVA, JGAP, ...).
We argue, that the uptake of GP for real world applications has
only been limited so far, because it is difficult to use the
available high-quality implementations of GP, and it takes a lot
of time to implement more complex GP problems.
GPDL separates the implementation of problem details from the
intricacies of algorithm implementations. Only the details of the
problem are specified in a framework-independent way. A compiler
can transform the problem description to source code for
different GP systems. This way, it will be much easier to
implement problems and to use different GP implementations or
even other kinds of solvers!
GPDL is based on the concept of attributed grammars with semantic
actions that are usually used in compiler
construction. Basically, GP can be described as search over a
space of sentences of a formal language. The goal is to find a
sentence with optimal objective function value. Therefore, a GP
problem can be defined as a tuple of a formal language, defined
e.g. via a grammar, and an objective function for
sentences. Therefore, our idea is to specify GP problems via an
attributed grammar with semantic actions for the interpretation
of sentences, and an objective function to be minimized or
maximized. Below, you will find an example for the definition of
a symbolic regression problem in GPDL.
GPDL is not limited to C# or Java, and it is not limited to
HeuristicLab (however, it is the only system we implemented a
compiler for so far).
On this page we provide a first specification of the GPDL
language and a reference implementation for a GPDL compiler for
HeuristicLab.
== Documentation ==
* [[http://dev.heuristiclab.com/trac/hl/core/raw-attachment/wiki/UsersGPDL/gpdl-kronberger-2013.pdf | GECCO 2013 Paper: GPDL: A Framework-Independent Problem Definition Language for Grammar Guided Genetic Programming (GECCO 2013)]]
* [[http://dev.heuristiclab.com/trac/hl/core/raw-attachment/wiki/UsersGPDL/Presentation Gecco 2013.pdf | GECCO 2013 Presentation in the "Evolutionary Software Systems Workshop"]]
== Syntax Definition ==
* [[source:/branches/HeuristicLab.Problems.GPDL/SyntaxAnalyzer/GPDef.atg|GPDef.atg ]](for [[http://www.ssw.uni-linz.ac.at/coco/ | Coco/R]])
== Reference Implementation for HeuristicLab ==
* [[source:/branches/HeuristicLab.Problems.GPDL/SyntaxAnalyzer/|GPDL Syntax Analyzer]]
* [[source:/branches/HeuristicLab.Problems.GPDL/GpdlCompiler/|GPDL Compiler (command line)]]
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL/3.4/|GPDL plugin for HeuristicLab]]
== Tools ==
* [[source:/branches/HeuristicLab.Problems.GPDL/CocoR|Coco/R (executables and frame files, .NET-version)]]
== Example Problem Definitions ==
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL.Views/3.4/Resources/symbreg Koza.txt|Koza-style symbolic regression]]
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL.Views/3.4/Resources/symbreg HEAL.txt|Symbolic regression with evolution of constants]]
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL.Views/3.4/Resources/Artificial Ant.txt|Artificial Ant]]
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL.Views/3.4/Resources/Factorial.txt|Factorial function]]
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL.Views/3.4/Resources/Fib.txt|Fibonacci function]]
* [[source:/branches/HeuristicLab.Problems.GPDL/HeuristicLab.Problems.GPDL.Views/3.4/Resources/multi-output-multiplier.txt|Multi-output multiplier]]
== Changelog ==
* 2013/07/19: transformed ATG for the GPDL syntax analyzer and compiler to Coco/R syntax (from Coco-2)
* 2013/07/08: first release at GECCO
== Motivating Example: Symbolic Regression ==
This is a fully self-contained specification of a symbolic regression problem (Poly-10 benchmark). This file can be compiled using our reference GPDL compiler for HeuristicLab to create a solver for this problem. Different kinds of solvers available in HeuristicLab can be used to solve the problem without any changes to the problem specification.
{{{
#!csharp
PROBLEM SymbRegKoza
CODE <<
double[,] x;
double[] y;
string[] variableNames;
Dictionary nameToCol;
double GetValue(double[,] data, string varName, int row) {
if(nameToCol == null) {
/* init mapping */
nameToCol = new Dictionary();
for(int i=0; i xs, IEnumerable ys) {
HeuristicLab.Problems.DataAnalysis.OnlineCalculatorError error;
var r2 = HeuristicLab.Problems.DataAnalysis.OnlinePearsonsRSquaredCalculator.Calculate(xs, ys, out error);
if(error == HeuristicLab.Problems.DataAnalysis.OnlineCalculatorError.None) return r2;
else return 0.0;
}
>>
INIT <<
// generate 500 case of poly-10 benchmark function
int n = 500;
variableNames = new string[] {"x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10" };
var rand = new System.Random();
x = new double[n, 10];
y = new double[n];
for(int row = 0; row < 500; row++) {
for(int col = 0; col < 10; col++) {
x[row, col] = rand.NextDouble() * 2.0 - 1.0;
}
y[row] = x[row, 0] * x[row, 1] +
x[row, 2] * x[row, 3] +
x[row, 4] * x[row, 5] +
x[row, 0] * x[row, 6] + x[row, 8] +
x[row, 2] * x[row, 5] + x[row, 9];
}
>>
/* non-terminals of the problem */
NONTERMINALS
Model<>.
RPB<>.
Addition<>.
Subtraction<>.
Multiplication<>.
Division<>.
/* terminals of the problem: random constants (ERC) and variables */
TERMINALS
ERC<>
CONSTRAINTS
val IN RANGE <<-100>> .. <<100>>
.
Var<>
CONSTRAINTS
varName IN SET <>
.
/* grammar rules for the problem with interleaved semantic actions */
RULES
Model<> =
RPB<> .
RPB<> = LOCAL << string varName; >>
Addition<>
| Subtraction<>
| Division<>
| Multiplication<>
| Var<> SEM << val = GetValue(x, varName, row); >>
| ERC<>
.
Addition<> = LOCAL << double x1, x2; >>
RPB<> RPB<> SEM<< val = x1 + x2; >>
.
Subtraction<> = LOCAL << double x1, x2; >>
RPB<> RPB<> SEM<< val = x1 - x2; >>
.
Division<> = LOCAL << double x1, x2; >>
RPB<> RPB<> SEM<< val = x1 / x2; >>
.
Multiplication<> = LOCAL << double x1, x2; >>
RPB<> RPB<> SEM<< val = x1 * x2; >>
.
/* objective function */
MAXIMIZE /* could also use the keyword MINIMIZE here */
<<
var rows = System.Linq.Enumerable.Range(0, x.GetLength(0));
var predicted = rows.Select(r => {
double result;
Model(r, out result); /* we can call the root symbol directly */
return result;
});
return RSquared(predicted, y);
>>
END SymbRegKoza.
}}}
== Building ==
The top layer contains the syntax description of GPDL. For each separate backend a separate ATG must be defined that is then translated with Coco/R to generate a framework-specific GPDL compiler. So, at the top level a new GPDL compiler must be built for each new version of the GPDL specification and for each backend. Coco/R is available for many programming languages, so we can easily create different GPDL compilers for different programming languages.
In the second layer, the framework-specific GPDL compiler can then be used to compile GPDL problem descriptions to source code for the targeted platform (backend). This source code can be compiled to a solver, that can be used to solve several different problem instances of the general problem (e.g., by using different data files).
[[Image(GPDL-compiler-reference.png)]]