Opened 7 months ago

Last modified 2 months ago

#2704 accepted feature request

Generate random regression benchmark instances

Reported by: bburlacu Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.x Backlog
Component: Problems.DataAnalysis.Symbolic Version: 3.3.14
Keywords: Cc:

Description

The ability to randomly generate symbolic regression benchmarks according to user-defined rules would be useful for testing new algorithms and for the development of knowledge networks.

A previous effort exists (ticket #2083) for generating data according to user specified formulas. This ticket has a different scope as the focus is shifted towards generation of random instances based on user-defined templates.

Attachments (2)

Expression Generator Example.hl (1.4 KB) - added by bburlacu 7 months ago.
Expression Generator Code Sample.cs (2.4 KB) - added by bburlacu 6 months ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 7 months ago by bburlacu

r14409: Initial commit of HeuristicLab.ExpressionGenerator plugin. Note that the code depends on #2703 which should be merged first.

Last edited 7 months ago by bburlacu (previous) (diff)

comment:2 Changed 7 months ago by bburlacu

r14410: Added the possibility to sample possible arguments without repetition when instantiating expression templates.

comment:3 Changed 7 months ago by bburlacu

r14411: Slightly improve usage of ExpressionTemplate class.

comment:4 Changed 7 months ago by bburlacu

r14448: Rename expression label to name, and remove label from expression template as it should be specified for each individual instance.

Changed 7 months ago by bburlacu

comment:5 Changed 7 months ago by gkronber

Please add AssemblyInfo frame file

comment:6 Changed 7 months ago by gkronber

Please also include a 'reference' template for expressions in the plugin. Instead of the C# script.

comment:7 Changed 7 months ago by bburlacu

r14480: Implement export of expressions as infix strings. Include missing AssemblyInfo.cs.frame file and set language version to C# 4.0.

comment:8 Changed 6 months ago by bburlacu

14505: Improve expression generation and fix file formatting.

comment:9 Changed 6 months ago by bburlacu

r14510: Minor changes (sample template arguments without repetition)

Changed 6 months ago by bburlacu

comment:10 Changed 6 months ago by bburlacu

  • Owner changed from bburlacu to gkronber
  • Status changed from new to assigned

r14515: Fix infix formatting bug, refactored generation of polynomial to respect user choices (with or without exp/log).

comment:11 Changed 6 months ago by bburlacu

r14520: Improve infix formatting

comment:12 Changed 5 months ago by gkronber

  • Status changed from assigned to accepted

comment:13 Changed 4 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.x Backlog

comment:14 Changed 2 months ago by gkronber

TODO:

  • Limit the number of variables in 1/x log(x) and exp(x)
  • Allow definition of ranges for argument values for 1/x, log(x) and exp(x) and automatically generating constants (e.g. in log(c1 * (x1 + x2 + x3) + c2) c1 and c2 should be set automatically to make sure that taking the logarithm produces useful results. Same for exp(...)

Examples of functions currently produced:

(1 / (x7 * x4 * x3) + x4 + 1 / (x2 + x1 + x9 + x4 + x6 + x10 + x7 + x3) + x8 + 1 / x4 + 1 / (x6 * x10 * x5 * x7 * x1 * x4 * x9 * x3 * x2) + x3 + x6)

(exp(x10) + exp((x4 * x3 * x2)) + log(x9) + 1 / exp(x1) + exp(x2) + exp(x6) + x5 + 1 / x2 + 1 / exp((x9 * x4 * x6 * x5 * x7 * x8)))
  • 1 / (x2 + x1 + x9 + x4 + x6 + x10 + x7 + x3) almost imposible to find
  • log(x9) might be negative
  • exp(x4 * x3 * x2) might become very large
  • 1/x2 has an instability

comment:15 Changed 2 months ago by gkronber

Script to test the generator:

using System;
using System.Linq;
using System.Collections.Generic;
using HeuristicLab.Core;
using HeuristicLab.Common;
using HeuristicLab.ExpressionGenerator;
using HeuristicLab.Random;
using HeuristicLab.Problems.DataAnalysis;
using HeuristicLab.Problems.DataAnalysis.Symbolic;
using HeuristicLab.Problems.DataAnalysis.Symbolic.Regression;
using HeuristicLab.Optimization;


public class MyScript : HeuristicLab.Scripting.CSharpScriptBase {
  public override void Main() {
    // type your code here
    var urand = new FastRandom(1234);
    var nrand = new NormalDistributedRandom(urand, 0, 1);
    var variables = ExpressionGenerator.GenerateRandomDistributedVariables(10, "x", nrand).ToArray();
    var experiment = new Experiment();
    vars["experiment"] = experiment;
    for(int i=0;i<20;i++) {
      var p = ExpressionGenerator.RationalExpression(urand, variables, useLog: true, useExp: true);
    // Console.WriteLine(p.PrintDot());
    Console.WriteLine(p.PrintInfix());
    
      var infixParser = new   InfixExpressionParser();
      var t = infixParser.Parse(p.PrintInfix());
      var simplifier = new TreeSimplifier();
      t = simplifier.Simplify(t);
      var formatter = new InfixExpressionFormatter();
      Console.WriteLine(formatter.Format(t));
      vars["t"] = t;


      var dataForDs = new List<List<double>>();
      var variableNames = new List<string>();
    
      var eval = new ExpressionEvaluator();

      // sample    
      var data = eval.GenerateData(variables.Concat(new [] {p}), 1000);
    
      foreach(var variable in variables) {
        dataForDs.Add(data[variable]);
        variableNames.Add(variable.Name);
      }
      dataForDs.Add(data[p]);
      variableNames.Add("y");
    
      var ds = new Dataset(variableNames, dataForDs);
     
      var problemData = new RegressionProblemData(ds, variables.Select(v => v.Name), "y");
      vars["problemData"] = problemData;
      // var osgp = (IAlgorithm) vars["OSGP"];
      // var regProblem = (SymbolicRegressionSingleObjectiveProblem)osgp.Problem;                                      
      // regProblem.Name =  formatter.Format(t);
      // osgp.Name = "OSGP - " + regProblem.Name;
      // regProblem.ProblemData = problemData;
      // 
      // var br = new BatchRun();
      // br.Repetitions = 5;
      // br.Optimizer = (IOptimizer)osgp.Clone();
      // br.Name = br.Optimizer.Name;
      // experiment.Optimizers.Add(br);      
    }
  }
}
Last edited 2 months ago by gkronber (previous) (diff)

comment:16 Changed 2 months ago by gkronber

r14873: added possibility to automatically adjust constants based on the distributions of evaluated expressions to allow limiting distributions for arguments of functions (e.g. log should have positive args only, exp should have rather small args only)

comment:17 Changed 2 months ago by gkronber

r14880: added more expression templates

Note: See TracTickets for help on using tickets.