Opened 3 years ago

Last modified 3 months ago

#2704 accepted feature request

Generate random regression benchmark instances

Reported by: bburlacu Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Problems.DataAnalysis.Symbolic Version: branch
Keywords: Cc:

Description

The ability to randomly generate symbolic regression benchmarks according to user-defined rules would be useful for testing new algorithms and for the development of knowledge networks.

A previous effort exists (ticket #2083) for generating data according to user specified formulas. This ticket has a different scope as the focus is shifted towards generation of random instances based on user-defined templates.

Attachments (2)

Expression Generator Example.hl (1.4 KB) - added by bburlacu 3 years ago.
Expression Generator Code Sample.cs (2.4 KB) - added by bburlacu 3 years ago.

Download all attachments as: .zip

Change History (24)

comment:1 Changed 3 years ago by bburlacu

r14409: Initial commit of HeuristicLab.ExpressionGenerator plugin. Note that the code depends on #2703 which should be merged first.

Last edited 3 years ago by bburlacu (previous) (diff)

comment:2 Changed 3 years ago by bburlacu

r14410: Added the possibility to sample possible arguments without repetition when instantiating expression templates.

comment:3 Changed 3 years ago by bburlacu

r14411: Slightly improve usage of ExpressionTemplate class.

comment:4 Changed 3 years ago by bburlacu

r14448: Rename expression label to name, and remove label from expression template as it should be specified for each individual instance.

Changed 3 years ago by bburlacu

comment:5 Changed 3 years ago by gkronber

Please add AssemblyInfo frame file

comment:6 Changed 3 years ago by gkronber

Please also include a 'reference' template for expressions in the plugin. Instead of the C# script.

comment:7 Changed 3 years ago by bburlacu

r14480: Implement export of expressions as infix strings. Include missing AssemblyInfo.cs.frame file and set language version to C# 4.0.

comment:8 Changed 3 years ago by bburlacu

14505: Improve expression generation and fix file formatting.

comment:9 Changed 3 years ago by bburlacu

r14510: Minor changes (sample template arguments without repetition)

Changed 3 years ago by bburlacu

comment:10 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to gkronber
  • Status changed from new to assigned

r14515: Fix infix formatting bug, refactored generation of polynomial to respect user choices (with or without exp/log).

comment:11 Changed 3 years ago by bburlacu

r14520: Improve infix formatting

comment:12 Changed 2 years ago by gkronber

  • Status changed from assigned to accepted

comment:13 Changed 2 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.x Backlog

comment:14 Changed 2 years ago by gkronber

TODO:

  • Limit the number of variables in 1/x log(x) and exp(x)
  • Allow definition of ranges for argument values for 1/x, log(x) and exp(x) and automatically generating constants (e.g. in log(c1 * (x1 + x2 + x3) + c2) c1 and c2 should be set automatically to make sure that taking the logarithm produces useful results. Same for exp(...)

Examples of functions currently produced:

(1 / (x7 * x4 * x3) + x4 + 1 / (x2 + x1 + x9 + x4 + x6 + x10 + x7 + x3) + x8 + 1 / x4 + 1 / (x6 * x10 * x5 * x7 * x1 * x4 * x9 * x3 * x2) + x3 + x6)

(exp(x10) + exp((x4 * x3 * x2)) + log(x9) + 1 / exp(x1) + exp(x2) + exp(x6) + x5 + 1 / x2 + 1 / exp((x9 * x4 * x6 * x5 * x7 * x8)))
  • 1 / (x2 + x1 + x9 + x4 + x6 + x10 + x7 + x3) almost imposible to find
  • log(x9) might be negative
  • exp(x4 * x3 * x2) might become very large
  • 1/x2 has an instability

comment:15 Changed 2 years ago by gkronber

Script to test the generator:

using System;
using System.Linq;
using System.Collections.Generic;
using HeuristicLab.Core;
using HeuristicLab.Common;
using HeuristicLab.ExpressionGenerator;
using HeuristicLab.Random;
using HeuristicLab.Problems.DataAnalysis;
using HeuristicLab.Problems.DataAnalysis.Symbolic;
using HeuristicLab.Problems.DataAnalysis.Symbolic.Regression;
using HeuristicLab.Optimization;


public class MyScript : HeuristicLab.Scripting.CSharpScriptBase {
  public override void Main() {
    // type your code here
    var urand = new FastRandom(1234);
    var nrand = new NormalDistributedRandom(urand, 0, 1);
    var variables = ExpressionGenerator.GenerateRandomDistributedVariables(10, "x", nrand).ToArray();
    var experiment = new Experiment();
    vars["experiment"] = experiment;
    for(int i=0;i<20;i++) {
      var p = ExpressionGenerator.RationalExpression(urand, variables, useLog: true, useExp: true);
    // Console.WriteLine(p.PrintDot());
    Console.WriteLine(p.PrintInfix());
    
      var infixParser = new   InfixExpressionParser();
      var t = infixParser.Parse(p.PrintInfix());
      var simplifier = new TreeSimplifier();
      t = simplifier.Simplify(t);
      var formatter = new InfixExpressionFormatter();
      Console.WriteLine(formatter.Format(t));
      vars["t"] = t;


      var dataForDs = new List<List<double>>();
      var variableNames = new List<string>();
    
      var eval = new ExpressionEvaluator();

      // sample    
      var data = eval.GenerateData(variables.Concat(new [] {p}), 1000);
    
      foreach(var variable in variables) {
        dataForDs.Add(data[variable]);
        variableNames.Add(variable.Name);
      }
      dataForDs.Add(data[p]);
      variableNames.Add("y");
    
      var ds = new Dataset(variableNames, dataForDs);
     
      var problemData = new RegressionProblemData(ds, variables.Select(v => v.Name), "y");
      vars["problemData"] = problemData;
      // var osgp = (IAlgorithm) vars["OSGP"];
      // var regProblem = (SymbolicRegressionSingleObjectiveProblem)osgp.Problem;                                      
      // regProblem.Name =  formatter.Format(t);
      // osgp.Name = "OSGP - " + regProblem.Name;
      // regProblem.ProblemData = problemData;
      // 
      // var br = new BatchRun();
      // br.Repetitions = 5;
      // br.Optimizer = (IOptimizer)osgp.Clone();
      // br.Name = br.Optimizer.Name;
      // experiment.Optimizers.Add(br);      
    }
  }
}
Last edited 2 years ago by gkronber (previous) (diff)

comment:16 Changed 2 years ago by gkronber

r14873: added possibility to automatically adjust constants based on the distributions of evaluated expressions to allow limiting distributions for arguments of functions (e.g. log should have positive args only, exp should have rather small args only)

comment:17 Changed 2 years ago by gkronber

r14880: added more expression templates

comment:18 Changed 18 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 4.x Backlog
  • Version changed from 3.3.14 to branch

comment:19 Changed 11 months ago by abeham

Please rename your branch by prepending it with the ticket number

comment:20 Changed 11 months ago by bburlacu

r16136: Rename branch

comment:21 Changed 7 months ago by gkronber

  • Milestone changed from HeuristicLab 4.x Backlog to HeuristicLab 3.3.16

comment:22 Changed 3 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.17
Note: See TracTickets for help on using tickets.