Opened 4 years ago

Closed 3 years ago

#2143 closed feature request (done)

Pruning of introns/negative impact nodes in symbolic data analysis expressions

Reported by: bburlacu Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.10
Component: Problems.DataAnalysis.Symbolic Version: 3.3.9
Keywords: Cc: mkommend

Description

An operator/analyzer should be implemented that calculates the impacts of individual nodes in the solution individuals and removes the ones with an impact value below a certain user-defined threshold. The user should also be able to choose between only removing introns (nodes with zero impact) or nodes with a negative impact as well.

Change History (24)

comment:1 Changed 4 years ago by bburlacu

  • Status changed from new to accepted

comment:2 Changed 4 years ago by bburlacu

r10368: Implemented symbolic data analysis pruning operator and analyzers.

comment:3 Changed 4 years ago by bburlacu

r10375: Fixed incorrect accessibility levels of constructors for the pruning analyzers.

comment:4 Changed 4 years ago by bburlacu

r10378: Added storable constructors.

comment:5 Changed 4 years ago by bburlacu

r10414: Modified the pruning operator and analyzer to use the FitnessCalculationPartition for impact and replacement values calculation, instead of the whole training data partition.

comment:6 Changed 4 years ago by bburlacu

r10417: SymbolicDataAnalysisExpressionPruningOperator: changed cloning constructor accessibility to protected. Added storable constructor.

comment:7 Changed 4 years ago by bburlacu

r10418: Added license header to SymbolicClassificationPruningAnalyzer.cs

comment:8 Changed 4 years ago by bburlacu

r10428: Reset the pruned subtrees/trees counters to zero on reinitialization.

comment:9 Changed 4 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from accepted to reviewing

comment:10 Changed 4 years ago by mkommend

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.10

comment:11 Changed 4 years ago by mkommend

Reviewing comments:

  • SymbolicDataAnalysisSingleObjectivePruningAnalyzer
    • missing license headers
    • Don't abuse the StatefulItem methods for counters, use global scope variables instead
    • Use properties instead of variables for storable members (e.g., impact values caluclator)
    • Program as defensive as possible - use readonly attribute
    • Every value parameter must have a non-empty description
    • Use only one operator instance instead of creating multiple ones and subscopes processor
    • Use a DataTableValuesCollector for creating the DataTables
    • Use the mechansim of the MultiAnalyzer for checking the update interval
    • Returning a correctly configured operation collection will get rid of reentering the analyzer
  • SymbolicDataAnalysisExpressionPruningOperator
    • Use Lookup- and ValueParameters
    • Evaluation happens multiple times and should be unified, consider changing the impact value calculators
    • Do not use locks
    • Random property is not used at all
    • Move the model creation in the pruning operator

comment:12 Changed 4 years ago by mkommend

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

comment:13 Changed 4 years ago by bburlacu

r10469: Refactored pruning analyzer and operators as per review.

comment:14 Changed 4 years ago by bburlacu

r10470: Fixed deep cloning error and assembly plugin reference.

comment:15 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

r10955: Added extra check to ensure the PopulationSlice parameter is in the right range.

comment:16 Changed 3 years ago by bburlacu

r11013: Added missing check for population slice bounds, added PopulationSize parameter to get the number of individuals (instead of counting the subscopes of the current execution context).

comment:17 Changed 3 years ago by mkommend

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

Common things

  • missing license header
  • no properties for lookup parameters
  • parameters for fixed value parameters should provide getter and setter for the standard c# data types (e.g IFixedValueParameter<BoolValue> xParam => bool x {get{return xParam.Value}set {xParam.Value = value}})
  • empty lines between methods
  • avoid ToList in code for fitness calculation
  • unused properties and fields

SymbolicDataAnalysisExpressionPruningOperator

  • unify parameter descriptions
  • abstract methods should be below the ctors to make them more prominent
  • Quality of trees is not updated
  • Fitness calculation partition is used for impact calculation but the whole training partition is used for model evaluation !!!
  • Try to use ToList as rarely as possible, e.g. it is not necessary for the rows
  • Do not create a new Constant symbol, get the constant symbol from the grammar so that it has the correct local parameters

Test if any nodes are pruned if PruneOnlyZeroImpactNodes is active

SymbolicRegressionPruningOperator

  • avoid ToList
  • use the parameter property instead to access lookup parameter values
  • Evaluator parameter is not used => remove the parameter

Test if existing experiments can still be opened

SymbolicClassificationPruningOperator

  • ApplyLinearScalingParameter is never used => remove the parameter + property
  • Remove property for the model creator
  • CreateModel should use the whole training data
  • Evaluate should use only the fitness calculation partition
  • Avoid ToList
  • the classification is quality is given in terms of the accuracy and not the R²
  • License header is missing

SymbolicDataAnalysisSingleObjectivePruningAnalyzer

  • move initialize operators to the end of the file
  • UpdateCounter, UpdateInterval, PruningProbability, PopulationSliceParameter should be FixedValueParameters
  • empty lines between regions
  • rework parameter descriptions
  • rename oc to operations or something similar and more descriptive
  • initialize the necessary operator in the ctor
  • remove property for the emtpy operator
  • do not create the operations for data collection by yourself => chain the successor parameters of the analyzer and the operators and let the engine do the work => base apply call must be inserted and the cloning won't be necessary

comment:18 Changed 3 years ago by bburlacu

r11025: Attempted to fix the problems described above.

comment:19 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

comment:20 Changed 3 years ago by bburlacu

r11026:

  • Removed ToArray() calls from SymbolicClassificationPruningOperator.cs and SymbolicRegressionPruningOperator.cs.
  • Changed UpdateCounter, UpdateInterval, PruningProbability, PopulationSliceParameter to FixedValueParameters in SymbolicDataAnalysisSingleObjectivePruningAnalyzer.cs.
  • Removed empty operator data member
  • Added parameter descriptions
  • Improved source organization

comment:21 Changed 3 years ago by bburlacu

r11027: SymbolicDataAnalysisSingleObjectivePruningAnalyzer.cs: Added some code in the AfterDeserialization hook to replace ValueParameters with FixedValueParameters in order to avoid an InvalidCastException when trying to run older saved algorithms.

comment:22 Changed 3 years ago by mkommend

Reviewed r11025, r11026 & r11027.

comment:23 Changed 3 years ago by mkommend

  • Status changed from reviewing to readytorelease

comment:24 Changed 3 years ago by mkommend

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.