Opened 6 years ago
Closed 6 years ago
#2359 closed enhancement (done)
Improve the SymbolicDataAnalysisExpressionPruningOperator
Reported by: | bburlacu | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.12 |
Component: | Problems.DataAnalysis.Symbolic | Version: | 3.3.11 |
Keywords: | pruning, symbolic data analysis, classification, regression | Cc: |
Description
Some minor things need to be improved:
- provide static Prune methods that return a simplified tree (for classification and regression)
- set the impact values calculator in the constructor for the derived classes
Change History (24)
comment:1 Changed 6 years ago by bburlacu
- Status changed from new to accepted
- Summary changed from Improve the `SymbolicDataAnalysisExpressionPruningOperator` to Improve the SymbolicDataAnalysisExpressionPruningOperator
comment:2 Changed 6 years ago by bburlacu
comment:3 Changed 6 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from accepted to reviewing
comment:4 Changed 6 years ago by mkommend
r12358: Refactored pruning operators and analyzers.
comment:5 Changed 6 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
r12359: Removed commented code from pruning analyzer.
comment:6 Changed 6 years ago by mkommend
Please add the number of removed nodes as an additional data series in the analyzer's data table. Furthermore, have a detailed look at the changes in r12358 and review them.
comment:7 Changed 6 years ago by bburlacu
comment:8 Changed 6 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:9 Changed 6 years ago by ascheibe
- Owner changed from mkommend to gkronber
comment:10 Changed 6 years ago by gkronber
#2398 depends on this ticket
comment:11 Changed 6 years ago by gkronber
SymbolicDataAnalysisExpressionPruningOperator.Apply() produces incorrect quality values.
The problem is two-fold. (1) It is assumed that impacts of replacements are additive concerning the quality value. First the original quality is retrieved. In the loop over all nodes impacts are calculated repeatedly and each time node is pruned the quality is reduced by the impact value (quality -= impactValue).
(2) Impact calculators use accuracy (classification) or R² (regression) to calculate the impacts. However, the evaluation operator from the problem can be different (such as MSE or absolute error) therefore we cannot simply subtract the impact from the quality.
Proposed solution: completely re-evaluate pruned models with the evaluation operator from the problem.
comment:12 Changed 6 years ago by gkronber
r12674: use stable sort in pruning analyzer.
comment:13 Changed 6 years ago by gkronber
- Owner changed from gkronber to bburlacu
- Status changed from reviewing to assigned
comment:14 Changed 6 years ago by bburlacu
- Regarding (1), the CalculateImpactsAndReplacementValues uses internally the PearsonsRSquared measure (for regression) and the accuracy measure (for classification) to calculate impacts, which is exactly what the SymbolicDataAnalysisPruningOperator.Evaluate method provides. Providing an originalQuality simply avoids recalculating it inside the metohd on each call. Since the impact is actually calculated as impactValue = originalQuality - newQuality, within the for loop the new originalQuality can be calculated as quality -= impactValue, which helps speed things up between successive calls. The confusion lies here in the terminology: the originalQuality accepted by the CalculateImpactsAndReplacementValues has no connection to the actual quality of the indivudal (which can be MSE, absolute error, etc).
- (2) is indeed a problem, as the quality should not be updated that way. The problem is the line QualityParameter.ActualValue.Value = quality where as you pointed out, we cannot assume anything about the evaluation operator from the problem and which kind of quality measure it provides. Therefore, the solution should indeed be to completely re-evaluate pruned models with the evaluation operator from the problem.
comment:15 Changed 6 years ago by bburlacu
r12720: Changed the impact calculators so that the quality value necessary for impacts calculation is calculated with a separate method. Refactored the CalculateImpactAndReplacementValues method to return the new quality in an out-parameter (adjusted method signature in interface accordingly). Added Evaluate method to the regression and classification pruning operators that re-evaluates the tree using the problem evaluator after pruning was performed.
comment:16 Changed 6 years ago by bburlacu
- Owner changed from bburlacu to gkronber
- Status changed from assigned to reviewing
comment:17 Changed 6 years ago by gkronber
Reviewed all changes and found out that the pruning operators are not backwards compatible because parameters where added/removed/type-changed..
comment:18 Changed 6 years ago by gkronber
- Status changed from reviewing to assigned
comment:19 Changed 6 years ago by gkronber
- Status changed from assigned to accepted
comment:20 Changed 6 years ago by gkronber
r12744: added after-deserialization code for backwards-compatibility
comment:21 Changed 6 years ago by gkronber
comment:22 Changed 6 years ago by gkronber
- Status changed from accepted to reviewing
comment:23 Changed 6 years ago by gkronber
- Status changed from reviewing to readytorelease
comment:24 Changed 6 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
r12189: Implemented improvements