Opened 8 years ago
Last modified 6 years ago
#2613 assigned defect
Improve missing values handing in GBT
Reported by: | gkronber | Owned by: | fholzing |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.17 |
Component: | Algorithms.DataAnalysis | Version: | 3.3.13 |
Keywords: | Cc: |
Description (last modified by gkronber)
Currently, GBT does not consider missing values (or NaNs in our case) specifically. The training phase uses only comparisons therefore NaN values are implicitly treated as very small (or very large) values (1). When calculating estimated values the output of the right subtree is always used for NaN values.
I think it would be better to do the following
- in training ignore NaN values for input variables
- for estimated values calculation use a weigthed sum when we find a NaN value for a splitting variable (see #2612).
(1) because of the definition that all comparisons to NaN are false.
Change History (3)
comment:1 Changed 7 years ago by gkronber
- Description modified (diff)
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.16
- Owner changed from gkronber to mkommend
- Status changed from new to assigned
comment:2 Changed 7 years ago by mkommend
- Owner changed from mkommend to fholzing
comment:3 Changed 6 years ago by fholzing
- Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.17
Note: See
TracTickets for help on using
tickets.