Opened 11 months ago

Closed 3 months ago

#2697 closed enhancement (done)

Refactor certain parts for data-based modeling

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Problems.DataAnalysis Version: 3.3.14
Keywords: Cc:

Description

  • Move translation to AutoDiff out of evaluator
  • Create ISymbolicExpressionTree for linear model from vector of coefficients
  • Create two-dimensional array from ProblemData

Change History (34)

comment:1 Changed 11 months ago by gkronber

  • Owner set to gkronber
  • Status changed from new to accepted

comment:2 Changed 11 months ago by gkronber

r14378:

  • created a folder for all classes related to transformation from and to trees
  • created a transformator which takes a tree and uses AutoDiff to produce a function and gradient function for the tree.
  • moved code from SymbolicRegressionConstantOptimizationEvaluator to TreeToAutoDiffTermTransformator to make AutoDiff for trees more accessible

comment:3 Changed 10 months ago by gkronber

r14390:

  • renaming of folder "Transformation" to "Converters" to distinguish between transformations for variables (from data preprocessing) and classes for transformation of trees.
  • renamed SymbolicDataAnalysisExpressionTreeSimplifier -> TreeSimplifier
  • Implemented a converter to create a linar model as a symbolic expression tree

comment:4 Changed 10 months ago by gkronber

r14391:

  • extended converter for linear models to support lagged variables and changed AR(k) to use this method

comment:5 Changed 10 months ago by gkronber

Todo: remove class scaling (needs persistence backwards compat in GPR)

comment:6 Changed 10 months ago by gkronber

r14393:

  • removed AlglibUtil.cs and added an extension method .ToArray(names, rows) to IDataset instead.
  • refactored transformation so that it is possible to apply an transformation without resetting the parameters
  • Used transformations instead of Scaling as far as possible.
  • Moved TakeEvery extension method to HL.Common

comment:7 Changed 10 months ago by gkronber

r14394: renamed TreeSimplifier in unit test

comment:8 Changed 10 months ago by gkronber

r14396: added methods to get training and test input matrices from ProblemData

comment:9 Changed 10 months ago by gkronber

r14400: reverse merge of r14378, r14390, r14391, r14393, r14394, r14396 because the changes cannot be merged over to the #2650 branch

comment:10 Changed 10 months ago by gkronber

This is needed by #745.

However, #2650 should be merged to the trunk first before we apply the changes reverted in r14400 because there are many conflicting changes.

comment:11 Changed 9 months ago by mkommend

r14507: Added caching of parameters and removed resetting of training and test partition in AdjustProblemDataProperties (used for changing the problem data of a solution) in DataAnalysisProblemData.

comment:12 Changed 6 months ago by gkronber

r14840: applied changes from r14378 again

comment:13 Changed 6 months ago by gkronber

r14843: applied r14390, r14391, r14393, r14394, r14396 again (resolving conflicts)

comment:14 Changed 6 months ago by gkronber

r14845: fixed build fail of test solution

Last edited 6 months ago by gkronber (previous) (diff)

comment:15 Changed 6 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

r14851: removed unnecessary variables

comment:16 Changed 6 months ago by gkronber

r14854: introduced original code for scaling from AlglibUtil into GaussianProcessModel to guarantee backwards compatibility regarding unit test results

comment:17 Changed 5 months ago by mkommend

#2697: Made symbol fields readonly in TreeSimplifier.

comment:18 follow-ups: Changed 5 months ago by mkommend

Review comments

Enumerable Extensions

  • Why has the ext. method TakeEvery been added, although it is never used?
    gkronber: Good question. I traced this back to r7154. Seems we never needed this. Can be removed.
    mkommend: Addressed with comment:19.

HL.Problems.DataAnalysis.Symbolic

  • Delete empty and unused folder Transformation
    mkommend: Addressed with comment:20.

HeuristicLab.Problems.DataAnalysis Transformations

  • Why do we need transformations?!?!
  • Can't we delete all of them?
  • If not, why don't we move them to the DataPreprocessing and forget about them?

gkronber: They are used in DataPreprocessing. After the changes in this ticket they are also used for scaling of input variables (instead of the Scaling class). Any symbolic regression model can have transformations. If we want to keep support for this at least the interface is necessary in the symbolic namespace.

Edit: After discussion we decided to move all transformations to the DataPreprocessing namespace and remove support for transformations from symbolic models.

Converters

  • I don't really get the benefit of the Convert class, besides listing somehow related conversion operations.
    gkronber: addressed with comment:22
  • AFAIK the TreeSimplifier is stateless? If that's the case why don't we provide a static method for simplification. e.g. var simplifiedTree = TreeSimplifier.Simplify(originalTree);.
    gkronber: addressed with comment:23
  • TreeToAutoDiffTermConverter encapsulates the functionality nicely. This new refactoring would also enable to use exception for control flow handling in TryConvertToAutoDiff(node,term) instead of always checking the transformation result (true or false).
    gkronber: addressed with comment:24

Finished reviewing the changes in this tickets.

Last edited 5 months ago by gkronber (previous) (diff)

comment:19 in reply to: ↑ 18 Changed 5 months ago by mkommend

Replying to mkommend:

Enumerable Extensions

  • Why has the ext. method TakeEvery been added, although it is never used?

gkronber: Good question. I traced this back to r7154. Seems we never needed this. Can be removed.

r14944: Removed TakeEvery methods from EnumerableExtensions.cs

comment:20 in reply to: ↑ 18 Changed 5 months ago by mkommend

Replying to mkommend:

HL.Problems.DataAnalysis.Symbolic

  • Delete empty and unused folder Transformation

r14945: Removed empty folder HL.Problems.DataAnalysis.Symbolic\Transformation.

comment:21 Changed 5 months ago by mkommend

  • Owner changed from mkommend to gkronber

comment:22 Changed 5 months ago by gkronber

r14948: removed static Convert class

comment:23 Changed 5 months ago by gkronber

r14949: made TreeSimplifier static

comment:24 Changed 5 months ago by gkronber

r14950: code improvement in TreeToAutoDiffTermConverter

comment:25 Changed 4 months ago by gkronber

  • Status changed from reviewing to readytorelease

comment:26 Changed 3 months ago by gkronber

Since MCTS has been removed (already merged to stable #2581) there are conflicts when merging this ticket to stable.

comment:27 Changed 3 months ago by gkronber

r15140: merged r14507 from trunk to stable

comment:28 Changed 3 months ago by gkronber

r15141: merged r14840 from trunk to stable

comment:29 Changed 3 months ago by gkronber

r15142: merged r14843 (resolving conflicts in csproj file for HL.Algorithms.DataAnalysis because MCTS has been removed)

comment:30 Changed 3 months ago by gkronber

r15143: merged r14845, r14851, r14854 from trunk to stable

Last edited 3 months ago by gkronber (previous) (diff)

comment:31 Changed 3 months ago by gkronber

r15144: merged r14944, r14945, r14948 from trunk to stable

Last edited 3 months ago by gkronber (previous) (diff)

comment:32 Changed 3 months ago by gkronber

r14938 also belongs to this ticket

comment:33 Changed 3 months ago by gkronber

r15145: merged r14938,r14949,r14950 from trunk to stable (all changesets merged)

comment:34 Changed 3 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.