wiki:Documentation/Reference/GPDL

Context Navigation

Version 33 (modified by gkronber, 11 years ago) (diff)
--

Genetic Programming Problem Definition Language (GPDL)

Gabriel Kronberger, last update 3rd of August, 2013 gkronber@heuristiclab.com

The aim of GPDL is to make it easier to use GP-systems. Currently, it is very cumbersome to implement new problems in GP-systems, because of several factors including among others:

a lot of boiler-plate code has to be written to integrate into the GP framework,
it is necessary to learn APIs of different GP system,
problem implementations cannot be re-used to try different GP systems (e.g. ECJ, HeuristicLab, GEVA, JGAP, ...).

We argue, that the uptake of GP for real world applications has only been limited so far, because it is difficult to use the available high-quality implementations of GP, and it takes a lot of time to implement more complex GP problems.

GPDL separates the implementation of problem details from the intricacies of algorithm implementations. Only the details of the problem are specified in a framework-independent way. A compiler can transform the problem description to source code for different GP systems. This way, it will be much easier to implement problems and to use different GP implementations or even other kinds of solvers!

GPDL is based on the concept of attributed grammars with semantic actions that are usually used in compiler construction. Basically, GP can be described as search over a space of sentences of a formal language. The goal is to find a sentence with optimal objective function value. Therefore, a GP problem can be defined as a tuple of a formal language, defined e.g. via a grammar, and an objective function for sentences. Therefore, our idea is to specify GP problems via an attributed grammar with semantic actions for the interpretation of sentences, and an objective function to be minimized or maximized. Below, you will find an example for the definition of a symbolic regression problem in GPDL.

GPDL is not limited to C# or Java, and it is not limited to HeuristicLab (however, it is the only system we implemented a compiler for so far).

On this page we provide a first specification of the GPDL language and a reference implementation for a GPDL compiler for HeuristicLab.

Documentation

Syntax Definition

GPDef.atg (for Coco/R)

Reference Implementation for HeuristicLab

Tools

Coco/R (executables and frame files, .NET-version)

Example Problem Definitions

Changelog

Changes to the source code and examples are tracked in the ticket #2026.

2013/08/03: new problem definitions: multiplexer and even-parity
2013/07/19: transformed ATG for the GPDL syntax analyzer and compiler to Coco/R syntax (from Coco-2)
2013/07/08: first release at GECCO

Motivating Example: Symbolic Regression

This is a fully self-contained specification of a symbolic regression problem (Poly-10 benchmark). This file can be compiled using our reference GPDL compiler for HeuristicLab to create a solver for this problem. Different kinds of solvers available in HeuristicLab can be used to solve the problem without any changes to the problem specification.

PROBLEM SymbRegKoza
CODE <<
  double[,] x;
  double[] y;
  int[] rows;
  string[] variableNames;
  double[] randomConsts;
  
  Dictionary<string,int> nameToCol;
  
  double GetValue(double[,] data, string varName, int row) {
    if(nameToCol == null) {
      /* init mapping */
      nameToCol = new Dictionary<string, int>();
      for(int i=0; i<variableNames.Length; i++) {
        nameToCol[variableNames[i]] = i;
      }
    }
    return x[row, nameToCol[varName]];
  }

  double RSquared(IEnumerable<double> xs, IEnumerable<double> ys) {
    // calculate Pearson's correlation in one pass over xs and ys
    double sumx = 0.0;
    double sumy = 0.0;
    double sumxSq = 0.0;
    double sumySq = 0.0;
    double sumxy = 0.0;
    int n = 0;
    var xEnum = xs.GetEnumerator();
    var yEnum = ys.GetEnumerator();
    while(xEnum.MoveNext() & yEnum.MoveNext()) {
      sumx += xEnum.Current;
      sumy += yEnum.Current;
      sumxSq += xEnum.Current * xEnum.Current;
      sumySq += yEnum.Current * yEnum.Current;
      sumxy += xEnum.Current * yEnum.Current;
      n++;
    }
    System.Diagnostics.Debug.Assert(!(xEnum.MoveNext() | yEnum.MoveNext()));

    double num;
    double den;
    double r = 0.0;
    num = sumxy - ( ( sumx * sumy ) / n );
    den = Math.Sqrt( ( sumxSq - ( sumx*sumx ) / n ) *
                     ( sumySq - ( sumy*sumy ) / n ) );
    if(den > 0){
      r = num / den;
    }
    return r*r;
  } 
>>

INIT <<
  // generate 500 cases of poly-10 benchmark function
  int n = 500;
  variableNames = new string[] {"x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10" };
  var rand = new System.Random();
  x = new double[n, 10];
  y = new double[n];
  for(int row = 0; row < n; row++) {
    for(int col = 0; col < 10; col++) {
      x[row, col] = rand.NextDouble() * 2.0 - 1.0;
    }
    y[row] = x[row, 0] * x[row, 1] +
             x[row, 2] * x[row, 3] +
             x[row, 4] * x[row, 5] +
             x[row, 0] * x[row, 6] + x[row, 8] +
             x[row, 2] * x[row, 5] + x[row, 9];
  }
 
  rows = System.Linq.Enumerable.Range(0, n).ToArray();
  
  // generate 100 random constants in [-10.0 .. 10.0[
  randomConsts = Enumerable.Range(0, 100).Select(i => rand.NextDouble()*20.0 - 10.0).ToArray();
>>

NONTERMINALS
  Model<<int row, out double val>>.
  RPB<<int row, out double val>>.
  Addition<<int row, out double val>>.
  Subtraction<<int row, out double val>>.
  Multiplication<<int row, out double val>>.
  Division<<int row, out double val>>.

TERMINALS
  ERC<<out double val>> 
    CONSTRAINTS
      val IN SET << randomConsts >>
  .

  Var<<out string varName>> 
    CONSTRAINTS
      varName IN SET << variableNames >>
  .

RULES
  Model<<int row, out double val>> =
    RPB<<row, out val>> .

  RPB<<int row, out double val>> =                         LOCAL << string varName; >>
    Addition<<row, out val>>
    | Subtraction<<row, out val>>
    | Division<<row, out val>>
    | Multiplication<<row, out val>>
    | Var<<out varName>>                                   SEM << val = GetValue(x, varName, row); >>
    /* | ERC<<out val>> */
  .

  Addition<<int row, out double val>> =                    LOCAL << double x1, x2; >>
      RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 + x2; >>
  .
  Subtraction<<int row, out double val>> =                 LOCAL << double x1, x2; >>
      RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 - x2; >>
  .
  Division<<int row, out double val>> =                    LOCAL << double x1, x2; >>
      RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 / x2; >>
  .
  Multiplication<<int row, out double val>> =              LOCAL << double x1, x2; >>
      RPB<<row, out x1>> RPB<<row, out x2>>                SEM<< val = x1 * x2; >>
  .

MAXIMIZE
  << 
    var predicted = rows.Select(r => {
      double result;
      Model(r, out result);                                /* we can call the root symbol directly */
      return result;
    });
    return RSquared(predicted, y);
  >>
END SymbRegKoza.

Building

The top layer contains the syntax description of GPDL. For each separate backend a separate ATG must be defined that is then translated with Coco/R to generate a framework-specific GPDL compiler. So, at the top level a new GPDL compiler must be built for each new version of the GPDL specification and for each backend. Coco/R is available for many programming languages, so we can easily create different GPDL compilers for different programming languages.

In the second layer, the framework-specific GPDL compiler can then be used to compile GPDL problem descriptions to source code for the targeted platform (backend). This source code can be compiled to a solver, that can be used to solve several different problem instances of the general problem (e.g., by using different data files).