Free cookie consent management tool by TermsFeed Policy Generator

Opened 9 years ago

Closed 9 years ago

#2359 closed enhancement (done)

Improve the SymbolicDataAnalysisExpressionPruningOperator

Reported by: bburlacu Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.12
Component: Problems.DataAnalysis.Symbolic Version: 3.3.11
Keywords: pruning, symbolic data analysis, classification, regression Cc:

Description

Some minor things need to be improved:

  • provide static Prune methods that return a simplified tree (for classification and regression)
  • set the impact values calculator in the constructor for the derived classes

Change History (24)

comment:1 Changed 9 years ago by bburlacu

  • Status changed from new to accepted
  • Summary changed from Improve the `SymbolicDataAnalysisExpressionPruningOperator` to Improve the SymbolicDataAnalysisExpressionPruningOperator

comment:2 Changed 9 years ago by bburlacu

r12189: Implemented improvements

comment:3 Changed 9 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from accepted to reviewing

comment:4 Changed 9 years ago by mkommend

r12358: Refactored pruning operators and analyzers.

comment:5 Changed 9 years ago by mkommend

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

r12359: Removed commented code from pruning analyzer.

comment:6 Changed 9 years ago by mkommend

Please add the number of removed nodes as an additional data series in the analyzer's data table. Furthermore, have a detailed look at the changes in r12358 and review them.

comment:7 Changed 9 years ago by bburlacu

r12361: The changes in r12358 look fine to me. Added total number of pruned nodes in the analyzer's data table. Removed unused parameter names in the SymbolicDataAnalysisSingleObjectivePruningAnalyzer.

comment:8 Changed 9 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

comment:9 Changed 9 years ago by ascheibe

  • Owner changed from mkommend to gkronber

comment:10 Changed 9 years ago by gkronber

#2398 depends on this ticket

comment:11 Changed 9 years ago by gkronber

SymbolicDataAnalysisExpressionPruningOperator.Apply() produces incorrect quality values.

The problem is two-fold. (1) It is assumed that impacts of replacements are additive concerning the quality value. First the original quality is retrieved. In the loop over all nodes impacts are calculated repeatedly and each time node is pruned the quality is reduced by the impact value (quality -= impactValue).

(2) Impact calculators use accuracy (classification) or R² (regression) to calculate the impacts. However, the evaluation operator from the problem can be different (such as MSE or absolute error) therefore we cannot simply subtract the impact from the quality.

Proposed solution: completely re-evaluate pruned models with the evaluation operator from the problem.

comment:12 Changed 9 years ago by gkronber

r12674: use stable sort in pruning analyzer.

comment:13 Changed 9 years ago by gkronber

  • Owner changed from gkronber to bburlacu
  • Status changed from reviewing to assigned

comment:14 Changed 9 years ago by bburlacu

  • Regarding (1), the CalculateImpactsAndReplacementValues uses internally the PearsonsRSquared measure (for regression) and the accuracy measure (for classification) to calculate impacts, which is exactly what the SymbolicDataAnalysisPruningOperator.Evaluate method provides. Providing an originalQuality simply avoids recalculating it inside the metohd on each call. Since the impact is actually calculated as impactValue = originalQuality - newQuality, within the for loop the new originalQuality can be calculated as quality -= impactValue, which helps speed things up between successive calls. The confusion lies here in the terminology: the originalQuality accepted by the CalculateImpactsAndReplacementValues has no connection to the actual quality of the indivudal (which can be MSE, absolute error, etc).
  • (2) is indeed a problem, as the quality should not be updated that way. The problem is the line QualityParameter.ActualValue.Value = quality where as you pointed out, we cannot assume anything about the evaluation operator from the problem and which kind of quality measure it provides. Therefore, the solution should indeed be to completely re-evaluate pruned models with the evaluation operator from the problem.

comment:15 Changed 9 years ago by bburlacu

r12720: Changed the impact calculators so that the quality value necessary for impacts calculation is calculated with a separate method. Refactored the CalculateImpactAndReplacementValues method to return the new quality in an out-parameter (adjusted method signature in interface accordingly). Added Evaluate method to the regression and classification pruning operators that re-evaluates the tree using the problem evaluator after pruning was performed.

comment:16 Changed 9 years ago by bburlacu

  • Owner changed from bburlacu to gkronber
  • Status changed from assigned to reviewing

comment:17 Changed 9 years ago by gkronber

Reviewed all changes and found out that the pruning operators are not backwards compatible because parameters where added/removed/type-changed..

Last edited 9 years ago by gkronber (previous) (diff)

comment:18 Changed 9 years ago by gkronber

  • Status changed from reviewing to assigned

comment:19 Changed 9 years ago by gkronber

  • Status changed from assigned to accepted

comment:20 Changed 9 years ago by gkronber

r12744: added after-deserialization code for backwards-compatibility

comment:21 Changed 9 years ago by gkronber

r12745: (combined stable merge #2398) merged r12189, r12358, r12359, r12361, r12461, r12674, r12720, r12744 from trunk to stable

Last edited 9 years ago by gkronber (previous) (diff)

comment:22 Changed 9 years ago by gkronber

  • Status changed from accepted to reviewing

comment:23 Changed 9 years ago by gkronber

  • Status changed from reviewing to readytorelease

comment:24 Changed 9 years ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.