Opened 11 years ago
Closed 10 years ago
#2143 closed feature request (done)
Pruning of introns/negative impact nodes in symbolic data analysis expressions
Reported by: | bburlacu | Owned by: | mkommend |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.10 |
Component: | Problems.DataAnalysis.Symbolic | Version: | 3.3.9 |
Keywords: | Cc: | mkommend |
Description
An operator/analyzer should be implemented that calculates the impacts of individual nodes in the solution individuals and removes the ones with an impact value below a certain user-defined threshold. The user should also be able to choose between only removing introns (nodes with zero impact) or nodes with a negative impact as well.
Change History (24)
comment:1 Changed 11 years ago by bburlacu
- Status changed from new to accepted
comment:2 Changed 11 years ago by bburlacu
comment:3 Changed 11 years ago by bburlacu
r10375: Fixed incorrect accessibility levels of constructors for the pruning analyzers.
comment:4 Changed 11 years ago by bburlacu
r10378: Added storable constructors.
comment:5 Changed 11 years ago by bburlacu
r10414: Modified the pruning operator and analyzer to use the FitnessCalculationPartition for impact and replacement values calculation, instead of the whole training data partition.
comment:6 Changed 11 years ago by bburlacu
r10417: SymbolicDataAnalysisExpressionPruningOperator: changed cloning constructor accessibility to protected. Added storable constructor.
comment:7 Changed 11 years ago by bburlacu
r10418: Added license header to SymbolicClassificationPruningAnalyzer.cs
comment:8 Changed 11 years ago by bburlacu
r10428: Reset the pruned subtrees/trees counters to zero on reinitialization.
comment:9 Changed 11 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from accepted to reviewing
comment:10 Changed 11 years ago by mkommend
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.10
comment:11 Changed 11 years ago by mkommend
Reviewing comments:
- SymbolicDataAnalysisSingleObjectivePruningAnalyzer
- missing license headers
- Don't abuse the StatefulItem methods for counters, use global scope variables instead
- Use properties instead of variables for storable members (e.g., impact values caluclator)
- Program as defensive as possible - use readonly attribute
- Every value parameter must have a non-empty description
- Use only one operator instance instead of creating multiple ones and subscopes processor
- Use a DataTableValuesCollector for creating the DataTables
- Use the mechansim of the MultiAnalyzer for checking the update interval
- Returning a correctly configured operation collection will get rid of reentering the analyzer
- SymbolicDataAnalysisExpressionPruningOperator
- Use Lookup- and ValueParameters
- Evaluation happens multiple times and should be unified, consider changing the impact value calculators
- Do not use locks
- Random property is not used at all
- Move the model creation in the pruning operator
comment:12 Changed 11 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
comment:13 Changed 11 years ago by bburlacu
r10469: Refactored pruning analyzer and operators as per review.
comment:14 Changed 11 years ago by bburlacu
r10470: Fixed deep cloning error and assembly plugin reference.
comment:15 Changed 10 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
r10955: Added extra check to ensure the PopulationSlice parameter is in the right range.
comment:16 Changed 10 years ago by bburlacu
r11013: Added missing check for population slice bounds, added PopulationSize parameter to get the number of individuals (instead of counting the subscopes of the current execution context).
comment:17 Changed 10 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
Common things
- missing license header
- no properties for lookup parameters
- parameters for fixed value parameters should provide getter and setter for the standard c# data types (e.g IFixedValueParameter<BoolValue> xParam => bool x {get{return xParam.Value}set {xParam.Value = value}})
- empty lines between methods
- avoid ToList in code for fitness calculation
- unused properties and fields
SymbolicDataAnalysisExpressionPruningOperator
- unify parameter descriptions
- abstract methods should be below the ctors to make them more prominent
- Quality of trees is not updated
- Fitness calculation partition is used for impact calculation but the whole training partition is used for model evaluation !!!
- Try to use ToList as rarely as possible, e.g. it is not necessary for the rows
- Do not create a new Constant symbol, get the constant symbol from the grammar so that it has the correct local parameters
Test if any nodes are pruned if PruneOnlyZeroImpactNodes is active
SymbolicRegressionPruningOperator
- avoid ToList
- use the parameter property instead to access lookup parameter values
- Evaluator parameter is not used => remove the parameter
Test if existing experiments can still be opened
SymbolicClassificationPruningOperator
- ApplyLinearScalingParameter is never used => remove the parameter + property
- Remove property for the model creator
- CreateModel should use the whole training data
- Evaluate should use only the fitness calculation partition
- Avoid ToList
- the classification is quality is given in terms of the accuracy and not the R²
- License header is missing
SymbolicDataAnalysisSingleObjectivePruningAnalyzer
- move initialize operators to the end of the file
- UpdateCounter, UpdateInterval, PruningProbability, PopulationSliceParameter should be FixedValueParameters
- empty lines between regions
- rework parameter descriptions
- rename oc to operations or something similar and more descriptive
- initialize the necessary operator in the ctor
- remove property for the emtpy operator
- do not create the operations for data collection by yourself => chain the successor parameters of the analyzer and the operators and let the engine do the work => base apply call must be inserted and the cloning won't be necessary
comment:18 Changed 10 years ago by bburlacu
r11025: Attempted to fix the problems described above.
comment:19 Changed 10 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:20 Changed 10 years ago by bburlacu
- Removed ToArray() calls from SymbolicClassificationPruningOperator.cs and SymbolicRegressionPruningOperator.cs.
- Changed UpdateCounter, UpdateInterval, PruningProbability, PopulationSliceParameter to FixedValueParameters in SymbolicDataAnalysisSingleObjectivePruningAnalyzer.cs.
- Removed empty operator data member
- Added parameter descriptions
- Improved source organization
comment:21 Changed 10 years ago by bburlacu
r11027: SymbolicDataAnalysisSingleObjectivePruningAnalyzer.cs: Added some code in the AfterDeserialization hook to replace ValueParameters with FixedValueParameters in order to avoid an InvalidCastException when trying to run older saved algorithms.
comment:22 Changed 10 years ago by mkommend
comment:23 Changed 10 years ago by mkommend
- Status changed from reviewing to readytorelease
r10368: Implemented symbolic data analysis pruning operator and analyzers.