#2612 closed feature request (done)

Regression tree models should support evaluation even when some of the variables are missing or contain missing values

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.14
Component: Algorithms.DataAnalysis Version: 3.3.13
Keywords: Cc:

Description (last modified by gkronber)

as described in "Greedy Function Approximation" paper

Change History (12)

comment:1 Changed 13 months ago by gkronber

  • Owner set to gkronber
  • Status changed from new to accepted

comment:2 Changed 13 months ago by gkronber

  • Owner changed from gkronber to pfleck
  • Status changed from accepted to assigned

The regression tree models can be easily extended such that they calculate a weighted average estimated value if a given variable value is not available in the dataset. This can be used also for partial dependence plots (only add the variable for which the partial dependence should be calculated to the dataset).

r13895:

  • extended GBT to support calculation of partial dependence
  • changed persistence of regression tree models
  • added two unit tests.

comment:3 Changed 12 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.14 to HeuristicLab 3.3.15

In r13895 the persistence format for gradient boosted trees has been improved, and handling of missing values for evaluation of GBT models has been added.

The related ticket #2622 is concerned with adding correct handling of missing values to the training phase.

A view for plotting the partial dependence has not yet been added. Therefore, I'm moving this ticket to the next milestone.

comment:4 Changed 12 months ago by mkommend

  • Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.14
  • Owner changed from pfleck to gkronber

This ticket blocks the release of several others, e.g. #2604, #2541, or #1795, because of the changes to gradient boosted trees in r13895, on which the other tickets depend upon.

Last edited 12 months ago by gkronber (previous) (diff)

comment:5 Changed 12 months ago by mkommend

Reviewed r13895. The only changes in r13895 are the handling of missing values in the tree models and the change of the persistence format.

Last edited 12 months ago by gkronber (previous) (diff)

comment:6 Changed 12 months ago by gkronber

r14015: added NaN handling for the evaluation of regression tree models (GBT)

comment:7 Changed 12 months ago by gkronber

  • Description modified (diff)
  • Summary changed from Partial dependence plots for gradient boosted trees to Regression tree models should support evaluation even when some of the variables are missing or contain missing values

comment:8 Changed 12 months ago by gkronber

  • Status changed from assigned to reviewing

comment:9 Changed 12 months ago by gkronber

  • Status changed from reviewing to readytorelease

comment:10 Changed 12 months ago by gkronber

r14016: reverse merge of r14015

comment:11 Changed 12 months ago by gkronber

r14017: added NaN handling for the evaluation of regression tree models (GBT) (again see r14015).

comment:12 Changed 12 months ago by mkommend

  • Resolution set to done
  • Status changed from readytorelease to closed

r14023: Merged r13895 and r14017 into stable. r14015 and r14016 have been recorded in the merge info, but haven't actually be merged.

Note: See TracTickets for help on using tickets.