Opened 8 years ago
Last modified 6 years ago
#2704 accepted feature request
Generate random regression benchmark instances
Reported by: | bburlacu | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.17 |
Component: | Problems.DataAnalysis.Symbolic | Version: | branch |
Keywords: | Cc: |
Description
The ability to randomly generate symbolic regression benchmarks according to user-defined rules would be useful for testing new algorithms and for the development of knowledge networks.
A previous effort exists (ticket #2083) for generating data according to user specified formulas. This ticket has a different scope as the focus is shifted towards generation of random instances based on user-defined templates.
Attachments (2)
Change History (24)
comment:1 Changed 8 years ago by bburlacu
comment:2 Changed 8 years ago by bburlacu
r14410: Added the possibility to sample possible arguments without repetition when instantiating expression templates.
comment:3 Changed 8 years ago by bburlacu
r14411: Slightly improve usage of ExpressionTemplate class.
comment:4 Changed 8 years ago by bburlacu
r14448: Rename expression label to name, and remove label from expression template as it should be specified for each individual instance.
Changed 8 years ago by bburlacu
comment:5 Changed 8 years ago by gkronber
Please add AssemblyInfo frame file
comment:6 Changed 8 years ago by gkronber
Please also include a 'reference' template for expressions in the plugin. Instead of the C# script.
comment:7 Changed 8 years ago by bburlacu
r14480: Implement export of expressions as infix strings. Include missing AssemblyInfo.cs.frame file and set language version to C# 4.0.
comment:8 Changed 8 years ago by bburlacu
14505: Improve expression generation and fix file formatting.
comment:9 Changed 8 years ago by bburlacu
r14510: Minor changes (sample template arguments without repetition)
Changed 8 years ago by bburlacu
comment:10 Changed 8 years ago by bburlacu
- Owner changed from bburlacu to gkronber
- Status changed from new to assigned
r14515: Fix infix formatting bug, refactored generation of polynomial to respect user choices (with or without exp/log).
comment:11 Changed 8 years ago by bburlacu
r14520: Improve infix formatting
comment:12 Changed 8 years ago by gkronber
- Status changed from assigned to accepted
comment:13 Changed 8 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.x Backlog
comment:14 Changed 8 years ago by gkronber
TODO:
- Limit the number of variables in 1/x log(x) and exp(x)
- Allow definition of ranges for argument values for 1/x, log(x) and exp(x) and automatically generating constants (e.g. in log(c1 * (x1 + x2 + x3) + c2) c1 and c2 should be set automatically to make sure that taking the logarithm produces useful results. Same for exp(...)
Examples of functions currently produced:
(1 / (x7 * x4 * x3) + x4 + 1 / (x2 + x1 + x9 + x4 + x6 + x10 + x7 + x3) + x8 + 1 / x4 + 1 / (x6 * x10 * x5 * x7 * x1 * x4 * x9 * x3 * x2) + x3 + x6) (exp(x10) + exp((x4 * x3 * x2)) + log(x9) + 1 / exp(x1) + exp(x2) + exp(x6) + x5 + 1 / x2 + 1 / exp((x9 * x4 * x6 * x5 * x7 * x8)))
- 1 / (x2 + x1 + x9 + x4 + x6 + x10 + x7 + x3) almost imposible to find
- log(x9) might be negative
- exp(x4 * x3 * x2) might become very large
- 1/x2 has an instability
comment:15 Changed 8 years ago by gkronber
Script to test the generator:
using System; using System.Linq; using System.Collections.Generic; using HeuristicLab.Core; using HeuristicLab.Common; using HeuristicLab.ExpressionGenerator; using HeuristicLab.Random; using HeuristicLab.Problems.DataAnalysis; using HeuristicLab.Problems.DataAnalysis.Symbolic; using HeuristicLab.Problems.DataAnalysis.Symbolic.Regression; using HeuristicLab.Optimization; public class MyScript : HeuristicLab.Scripting.CSharpScriptBase { public override void Main() { // type your code here var urand = new FastRandom(1234); var nrand = new NormalDistributedRandom(urand, 0, 1); var variables = ExpressionGenerator.GenerateRandomDistributedVariables(10, "x", nrand).ToArray(); var experiment = new Experiment(); vars["experiment"] = experiment; for(int i=0;i<100;i++) { var p = ExpressionGenerator.NonlinearExpression(urand, variables, useLog: true, useExp: true); // Console.WriteLine(p.PrintDot()); // Console.WriteLine(p.PrintInfix()); var infixParser = new InfixExpressionParser(); var t = infixParser.Parse(p.PrintInfix()); var simplifier = new TreeSimplifier(); t = simplifier.Simplify(t); var formatter = new InfixExpressionFormatter(); Console.WriteLine(formatter.Format(t)); vars["t"] = t; var dataForDs = new List<List<double>>(); var variableNames = new List<string>(); var eval = new ExpressionEvaluator(); // sample var data = eval.GenerateData(variables.Concat(new [] {p}), 1000); foreach(var variable in variables) { dataForDs.Add(data[variable]); variableNames.Add(variable.Name); } dataForDs.Add(data[p]); variableNames.Add("y"); var ds = new Dataset(variableNames, dataForDs); var problemData = new RegressionProblemData(ds, variables.Select(v => v.Name), "y"); vars["problemData"] = problemData; // var osgp = (IAlgorithm) vars["OSGPC"]; // var regProblem = (SymbolicRegressionSingleObjectiveProblem)osgp.Problem; // regProblem.Name = formatter.Format(t); // osgp.Name = "OSGPC - " + regProblem.Name; // regProblem.ProblemData = problemData; // // var br = new BatchRun(); // br.Repetitions = 5; // br.Optimizer = (IOptimizer)osgp.Clone(); // br.Name = br.Optimizer.Name; // experiment.Optimizers.Add(br); } }
comment:16 Changed 8 years ago by gkronber
r14873: added possibility to automatically adjust constants based on the distributions of evaluated expressions to allow limiting distributions for arguments of functions (e.g. log should have positive args only, exp should have rather small args only)
comment:17 Changed 8 years ago by gkronber
r14880: added more expression templates
comment:18 Changed 7 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 4.x Backlog
- Version changed from 3.3.14 to branch
comment:19 Changed 6 years ago by abeham
Please rename your branch by prepending it with the ticket number
comment:20 Changed 6 years ago by bburlacu
r16136: Rename branch
comment:21 Changed 6 years ago by gkronber
- Milestone changed from HeuristicLab 4.x Backlog to HeuristicLab 3.3.16
comment:22 Changed 6 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.17
r14409: Initial commit of HeuristicLab.ExpressionGenerator plugin. Note that the code depends on #2703 which should be merged first.