Opened 4 months ago

Last modified 16 hours ago

#3136 reviewing feature request

Structure templates for symbolic regression

Reported by: dpiringe Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Problems.DataAnalysis.Symbolic Version: branch
Keywords: Cc:

Description (last modified by gkronber)

Implementing a new problem type with a structure template parameter for symbolic regression. The following components are necessary:

  • MultiEncoding
  • Views
  • Parser for structure: to parse an infix expression
  • Evaluator: based on IABoundEstimator
  • Interpreter: builds a full expression, based on structure template parameter and (multiple) evolved sub-expressions

Change History (76)

comment:1 Changed 4 months ago by dpiringe

  • Status changed from new to accepted

comment:2 Changed 4 months ago by dpiringe

  • Summary changed from GP ProblemType with Structural Parameters to Structural GP
  • Version changed from 3.3.16 to branch

r18054

  • branched trunk

comment:3 Changed 4 months ago by dpiringe

r18061

  • added a new type of problem called StructuredSymbolicRegressionSingleObjectiveProblem, represents a first construct for future implementations

comment:4 Changed 4 months ago by dpiringe

r18062

  • added a new Symbol SubFunctionSymbol for sub functions
  • modified InfixExpressionParser to support SubFunctionSymbol (parsing of variableNames still in work)
  • modified StructuredSymbolicRegressionSingleObjectiveProblem to extract sub functions and add them to MultiEncoding

comment:5 Changed 4 months ago by dpiringe

r18063

  • added view components and classes for sub functions

comment:6 Changed 4 months ago by dpiringe

r18065

  • modified InfixExpressionParser to fully support SubFunctionSymbol
    • created a SubFunctionTreeNode to store the function arguments
  • modified StructureTemplateView to regenerate the content state
  • first implementation for the main tree build up logic

comment:7 Changed 3 months ago by dpiringe

r18066

  • added a simple way of evaluation (using r2 evaluator)
  • added a simple analyzing logic for "Best Tree"
  • added a connection to SubFunction in SubFunctionTreeNode

comment:8 Changed 3 months ago by dpiringe

r18067

  • changed the StructureTemplateView -> nodes of type SubFunctionTreeNode are now clickable
    • still need to overhaul the UI elements
  • added a way to parse SubFunctionTreeNode with a unique name in InfixExpressionParser

comment:9 Changed 3 months ago by dpiringe

r18068

  • modified the StructureTemplateView to enable colorful tree nodes of type SubFunctionTreeNode
  • refactored SubFunctionTreeNode, SubFunction and StructureTemplate

comment:10 Changed 3 months ago by dpiringe

r18069

  • added linear scaling support for structure template parameter

comment:11 Changed 3 months ago by dpiringe

r18071

  • added linear scaling logic in Evaluate and (for UI reasons) Analyze
  • added logic forSubFunctionSymbol (modified OpCodes) -> the SubFunctionTreeNode is display in the tree but has no effect on evaluation (works like a flag)
    • works now with SymbolicDataAnalysisExpressionTreeInterpreter
  • default grammar for SubFunction is now ArithmeticExpressionGrammar instead of LinearScalingGrammar

comment:12 Changed 3 months ago by chaider

r18072

  • Added info text in StructureTemplateView
  • Fixed cloning constructors
  • Added check if linear scaling nodes are set

comment:13 Changed 3 months ago by dpiringe

r18073

  • fixed a bug: parsing an expression now resets the viewHost content, this prevents to view old content of non-existing sub-functions

comment:14 Changed 3 months ago by chaider

r18074

-Fixed cloning of StructureTemplate

comment:15 Changed 3 months ago by dpiringe

r18075

  • added a hidden interpreter parameter for StructuredSymbolicRegressionSingleObjectiveProblem
  • fixed a bug which crashed the application by changing ProblemData with different variables
  • fixed a bug which crashed the application by running the problem with an empty StructureTemplate
  • added a better output of exceptions of type AggregateException
  • added and resize event handler to repaint nodes of type SubFunctionTreeNode
  • code cleanup

comment:16 Changed 3 months ago by gkronber

  • Description modified (diff)
  • Milestone changed from HeuristicLab 4.0 to HeuristicLab 3.3.17
  • Summary changed from Structural GP to Structure templates for symbolic regression

comment:17 Changed 3 months ago by dpiringe

r18076

  • added new parameters
  • added the builded tree into the scope, this allows operators to use the final tree
  • added new operators

comment:18 Changed 2 months ago by dpiringe

r18081

  • changed the visibility of the following parameters: EstimationLimitsParameter, EvaluatorParameter and BestTrainingSolutionParameter
  • added first steps to set an evaluator as parameter
    • added a new parameter TreeEvaluatorParameter
    • added a temporary logic to static evaluator method Calculate
    • tried to change a lot of necessary parameters to use the method Evaluate, this caused a lot of problems -> reverted all changes

comment:19 Changed 2 months ago by dpiringe

r18084

  • added a new problem data provider AsadzadehProvider and the correspondig instance Asadzadeh1
    • implements the test setup of paper Symbolic regression based hybrid semiparametric modelling of processes: An example case of a bending process
  • used the Asadzadeh1 instance in StructuredSymbolicRegressionSingleObjectiveProblem for default setup
  • added the SubFunctionSymbol in DerivativeCalculator and IntervalArithBoundsEstimator

comment:20 Changed 2 months ago by dpiringe

r18095

  • added a Evaluate method, which uses the static method Calculate and evaluates a ISymbolicExpressionTree without the need of an ExecutionContext
    • implemented this new method in all single objective SymReg evaluators

comment:21 Changed 7 weeks ago by dpiringe

r18099

  • set the default template to f(_) when loading a new problem data
  • fixed a bug which caused the drawing of uncolored SubFunctionTreeNodes after using the window splitter
  • implemented a method to paint nodes of SubFunctionTreeNode as colored nodes for ISymbolicDataAnalysisModel

comment:22 Changed 7 weeks ago by dpiringe

r18101

  • recreated problem instance Asdzadeh1 as SheetBendingProcess
  • SheetBendingProcess is located in Physics and provided by Physics/PhysicsInstanceProvider

comment:23 Changed 7 weeks ago by dpiringe

r18103

  • refactor the evaluation logic of NMSESingleObjectiveConstraintsEvaluator
  • refactor the new method Evaluate for PearsonRSquaredAverageSimilarityEvaluator
  • change the parameter order of some evaluate/calculate methods

comment:24 Changed 7 weeks ago by dpiringe

r18104

  • overrode the method GetActualValue in ValueLookupParameter to get the default value when the execution context is null
  • reverted the linear scaling logic for NMSESingleObjectiveConstraintsEvaluator
  • in SymbolicRegressionConstantOptimizationEvaluator: removed the usage of GenerateRowsToEvaluate because it uses lookup parameters
  • set the value of RelativeNumberOfEvaluatedSamplesParameter for SymbolicRegressionConstantOptimizationEvaluator in StructuredSymbolicRegressionSingleObjectiveProblem if Maximization = true and the SymbolicRegressionConstantOptimizationEvaluator is configured as evaluator
  • added the SubFunctionSymbol in TreeToAutoDiffTermConverter

comment:25 Changed 5 weeks ago by dpiringe

r18133

  • updated interpeters of type ISymbolicDataAnalysisExpressionTreeInterpreter to support symbols of type SubFunctionSymbol

comment:26 Changed 5 weeks ago by dpiringe

r18134

  • added a new information box for StructureTemplate in StructureTemplateView with an extended description about structure templates

comment:27 Changed 5 weeks ago by dpiringe

r18139

  • changed the item name of SubFunctionSymbol from SubFunctionSymbol to SubFunction

comment:28 Changed 5 weeks ago by mkommend

r18146: Merged trunk changes into branch.

comment:29 Changed 5 weeks ago by mkommend

r18149: Merged trunk changes into branch.

comment:30 Changed 5 weeks ago by mkommend

r18150: Merged trunk changes into branch.

comment:31 Changed 5 weeks ago by dpiringe

r18151

  • fixed eventhandler reregister after deserialisazion/cloning
  • added a test case for StructuredSymbolicRegressionSingleObjectiveProblem
  • changed the usage of a Dictionary to List

comment:32 Changed 5 weeks ago by dpiringe

r18152

  • removed the calculation of EstimationLimits and set the interval [-inf, inf] as default
    • this parameter was never adjusted after problem construction -> caused bugs with the change of problem data
  • created two new method to setup/create the MultiEncoding and SymbolicExpressionTreeEncoding
  • configured the default template f(_) for a structure template

comment:33 Changed 5 weeks ago by dpiringe

r18154

  • overwrote the method SetEnabledStateOfControls for StructureTemplateView
  • fixed the wrong usage of infoLabel in StructureTemplateView -> added a new label errorLabel for textual output
  • deleted the resource file for StructureTemplateView

comment:34 Changed 5 weeks ago by dpiringe

r18155

  • merged trunk into branch

comment:35 Changed 5 weeks ago by dpiringe

r18156

  • adapted the unit test RunStructureTemplateRegressionSampleTest to match the results
  • added the sample to the optimizer start page

comment:36 Changed 5 weeks ago by dpiringe

r18157

  • adapted formatters to support SubFunctionSymbol

comment:37 Changed 5 weeks ago by gkronber

r18158: fixed "essential" unit tests

comment:38 follow-up: Changed 5 weeks ago by gkronber

@dpiringe, could you please also add support for "Number" to the native interpreter? I do not have the necessary dev environment installed.

Last edited 5 weeks ago by gkronber (previous) (diff)

comment:39 follow-up: Changed 5 weeks ago by gkronber

Please use BatchInterpreter instead of TreeInterpreter because it is more efficient.

comment:40 Changed 5 weeks ago by dpiringe

r18161

  • merged trunk into branch

comment:41 in reply to: ↑ 38 Changed 5 weeks ago by dpiringe

Replying to gkronber:

@dpiringe, could you please also add support for "Number" to the native interpreter? I do not have the necessary dev environment installed.

implemented it in ticket #3140 and merged it back into this branch as well as trunk

comment:42 in reply to: ↑ 39 Changed 5 weeks ago by dpiringe

Replying to gkronber:

Please use BatchInterpreter instead of TreeInterpreter because it is more efficient.

r18162: changed the parameter Interpreter in StructuredSymbolicRegressionSingleObjectiveProblem to use SymbolicDataAnalysisExpressionTreeBatchInterpreter as default interpreter

comment:43 follow-up: Changed 5 weeks ago by gkronber

Bugs / suggestion for improvments:

  • values for num are not optimized / changed (tested with paramopt and batchInterpreter) -> #3140
  • num values cannot be initialized to negative values (<num=-1.5>). Only via workaround: -<num=1.5>. -> #3140
  • Configuration for sub-functions (grammar, size limits) should be not be lost when reparsing (especially when only a minor detail in the structure expression is changed).
  • power-symbol: maybe it would be better to throw in the interpreter only when problematic arguments are actually detected (negative base with real-valued power). This would simplify entry of structure-templates. (we should create a new ticket for this.)
Last edited 5 weeks ago by gkronber (previous) (diff)

comment:44 Changed 5 weeks ago by dpiringe

r18164

  • fixed a bug in StructureTemplateView -> only nodes of type SubFunctionTreeNode are selectable
  • added a way to keep old sub functions after parsing a new expression
    • overwrote some basic object methods for SubFunction to keep it simple
    • only old sub functions, which match the name and signature of the new ones, are saved; examples:
      • old: f(x), new: f(x) -> keep old
      • old: f(x1), new: f(x1, x2) -> use new
      • old: f1(x), new f2(x) -> use new
Last edited 5 weeks ago by dpiringe (previous) (diff)

comment:45 in reply to: ↑ 43 Changed 5 weeks ago by dpiringe

Replying to gkronber:

Bugs / suggestion for improvments:

  • values for num are not optimized / changed (tested with paramopt and batchInterpreter) -> #3140
  • num values cannot be initialized to negative values (<num=-1.5>). Only via workaround: -<num=1.5>. -> #3140
  • Configuration for sub-functions (grammar, size limits) should be not be lost when reparsing (especially when only a minor detail in the structure expression is changed).
  • power-symbol: maybe it would be better to throw in the interpreter only when problematic arguments are actually detected (negative base with real-valued power). This would simplify entry of structure-templates. (we should create a new ticket for this.)

added a way to keep old sub functions, see r18164

comment:46 Changed 3 weeks ago by gkronber

r18176: merged r18165:18174 from trunk to branch (resolving conflicts in the parser)

comment:47 Changed 3 weeks ago by gkronber

r18177: removed a special case from the Evaluate method because it can never be true. Maximization is fixed to false in this problem and therefore the ParamOptEvaluator is not available as an evaluator.

comment:48 Changed 3 weeks ago by gkronber

r18178: copied code with minor modifications from ParameterOptimizationEvaluator into the NMSEConstraintsEvaluator because the code in ParameterOptimizationEvaluator uses R² internally and is incompatible to the NMSEEvaluator.

comment:49 Changed 3 weeks ago by gkronber

r18179: improved parameter optimization for NMSEConstraintsEvaluator. Use LM directly instead of lsfit to improve efficiency by using vectorized callbacks.

comment:50 Changed 3 weeks ago by gkronber

Open issue: parameters are not written back to the trees stored in the individuals after optimization. The reason is that trees from the individual are cloned and combined with the (cloned) template to construct the full tree. We should think about a way to update optimized parameters in the trees (including parameters which occur in the template).

    private ISymbolicExpressionTree BuildTree(Individual individual) {
      if (StructureTemplate.Tree == null)
        throw new ArgumentException("No structure template defined!");

      var clonedTemplate = (ISymbolicExpressionTree)StructureTemplate.Tree.Clone();

      // build main tree
      foreach (var subFunctionTreeNode in clonedTemplate.IterateNodesPrefix().OfType<SubFunctionTreeNode>()) {
        var subFunctionTree = individual.SymbolicExpressionTree(subFunctionTreeNode.Name);

        // add new tree
        var subTree = subFunctionTree.Root.GetSubtree(0)  // Start
                                          .GetSubtree(0); // Offset
        subTree = (ISymbolicExpressionTreeNode)subTree.Clone();
        subFunctionTreeNode.AddSubtree(subTree);

      }
      return clonedTemplate;
    }

comment:51 Changed 11 days ago by dpiringe

r18182

  • fixed missing/wrong event registration for SubFunction and StructuredSymbolicRegressionSingleObjectiveProblem

comment:52 Changed 11 days ago by mkommend

r18183: Fixed bug in parameter optimization code of NMSE evaluator (tree has never been updated).

Last edited 11 days ago by mkommend (previous) (diff)

comment:53 Changed 10 days ago by mkommend

r18184: Refactored structured GP problem.

comment:54 Changed 10 days ago by mkommend

r18185: Removed unused view SubFunctionListView.

comment:55 Changed 10 days ago by mkommend

r18187: Refactored saved trees in structure template.

comment:56 Changed 10 days ago by mkommend

r18188: Fixed backwards compatibility of StructureTemplate

comment:57 Changed 9 days ago by mkommend

r18189: Fixed name of MultiEncodingCreator.

comment:58 Changed 9 days ago by mkommend

r18190:

  • Added parameters for parameter optimization / linear scaling in StructeredSymRegProblem.
  • Added license headers to StructureTemplate and StructeredSymRegProblem.
  • Fixed type in NMSEConstraint evaluator.

comment:59 Changed 9 days ago by mkommend

r18191: Extracted linear scaling functionality in a dedicated helper class.

comment:60 Changed 8 days ago by mkommend

r18192:

  • Extracted parameter optimization into dedicated helper utility.
  • Implemented evaluation in the structured SymReg problem directly.

comment:61 Changed 8 days ago by mkommend

r18194:

  • Added handling of numeric parameters in structur GP problem by using a real vector encoding.
  • Configured grammar in sub function.
  • Added property for numeric parameters in structure template.

comment:62 Changed 8 days ago by mkommend

r18195: Refactored creation of subfunctions in StructureTemplate.

comment:63 Changed 8 days ago by mkommend

r18196: Fixed bug in adjustment of linear scaling terms.

comment:64 Changed 8 days ago by mkommend

r18197: Omitted parameter optimization of variable weights in the template part of the tree.

comment:65 Changed 8 days ago by mkommend

  • Owner changed from dpiringe to gkronber
  • Status changed from accepted to reviewing

comment:66 follow-ups: Changed 8 days ago by gkronber

Review comments:

  • RealVector cross-over raise exception when the template contains only one numeric parameters
  • Entering an expression which uses an non-existant input variables (e.g. "f(xyz)") shows error when parse is pressed the first time, but succeeds if it is pressed again without change to the structure.

comment:67 Changed 8 days ago by mkommend

r18198: Fixed error in structure GP if only one numeric parameter is present in the template by providing a new copy crossover for real vectors.

comment:68 in reply to: ↑ 66 Changed 8 days ago by mkommend

Replying to gkronber:

Review comments:

  • RealVector cross-over raise exception when the template contains only one numeric parameters

addressed in r18198

Last edited 8 days ago by mkommend (previous) (diff)

comment:69 Changed 8 days ago by mkommend

r18199: Fixed parsing of variables in subfunctions.

comment:70 in reply to: ↑ 66 Changed 8 days ago by mkommend

Replying to gkronber:

  • Entering an expression which uses an non-existant input variables (e.g. "f(xyz)") shows error when parse is pressed the first time, but succeeds if it is pressed again without change to the structure.

addressed in r18199

The presence of variables cannot be checked during parsing of the template. Thus, this functionality has been removed and if a non existing variable is used in the template a runtime error is raised.

comment:71 Changed 7 days ago by gkronber

The ItemName of the problem is ... Single Objective Problem (single-ojective). --> simplify

comment:72 Changed 5 days ago by mkommend

r18200: Improved item name and description of structured symreg problem.

comment:73 Changed 18 hours ago by gkronber

Todo gkronber: review and merge back to trunk. Todo David: rename class, change default grammar (and configuration), fix unit test.

Topics for future development:

  • Performance tuning (evaluate).
  • Instead of multi-encoding use a specific new encoding.
  • Improve GUI: don't show sub-functions right side of the tree but in the problem instead.
  • Shape-constraints for sub-functions.
  • Partial dependence plots / visualization / impact-calculation for sub-functions
  • Use (restricted) type-coherent grammar instead of arithmetic grammar as a default for sub-functions
  • Configuration of number of sub-trees for grammar symbols (currently 1 - 3 arguments).

Not directly related:

  • Exclusion of nodes in parameter optimization should be improved, remove code duplication for parameter optimization.
  • Remove code duplication for linear scaling.
Last edited 18 hours ago by gkronber (previous) (diff)

comment:74 Changed 16 hours ago by dpiringe

r18205

  • changed visibility of string constants in TypeCoherentExpressionGrammar from private to public
  • changed default grammar for SubFunction

comment:75 Changed 16 hours ago by dpiringe

r18206

  • renamed StructuredSymbolicRegressionSingleObjectiveProblem to StructureTemplateSymbolicRegressionProblem

comment:76 Changed 16 hours ago by dpiringe

r18207

  • updated test cases for StructureTemplateSymbolicRegressionProblem
  • updated optimizer template for StructureTemplateSymbolicRegressionProblem
  • updated project file for HeuristicLab.Problems.DataAnalysis.Symbolic.Regression (forgot to include last commit)
Note: See TracTickets for help on using tickets.