Opened 8 years ago
Closed 7 years ago
#2779 closed feature request (done)
For model validation and inspection an analysis of residuals over input variables could be insightful
Reported by: | gkronber | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.15 |
Component: | Problems.DataAnalysis.Views | Version: | 3.3.14 |
Keywords: | Cc: |
Description
In model validation we should check if the distribution of residuals is independent of the inputs and the target variable. If patterns are visible in the distribution of residuals this is a hint that the model does not fit the available data well.
We already have the nice bubble chart for analysis of experiments but it works only for runs and not for any kind of tabular data. However, the bubble chart easily handles 10.000 runs so it could potentially be used for this purpose as well.
Change History (19)
comment:1 Changed 8 years ago by gkronber
comment:2 Changed 8 years ago by gkronber
r14890: removed reference to resx file
comment:3 Changed 8 years ago by gkronber
TODO:
Clean up codeAdd absolute rel. error and abs. error- Maybe: restrict number of entries in xAxis dropdown box
- decided against this as I found that it is convenient to color rows by residuals and then show a scatter plot of two input variables.
Move calculated entries (error, prediction, ...) to the top of the dropdown boxremove entries in the dropdown box which are constant
comment:4 follow-up: ↓ 6 Changed 8 years ago by abeham
Having such a view for any tabular data would be really awesome. Maybe something like a dataframe in R? The IDataset is already pretty close to a dataframe. And the Dataset could easily be moved to HeuristicLab.Data where it would be more generally usable as a data structure for tabular data. Row names would probably still be nice to have...
comment:5 Changed 8 years ago by gkronber
- cleaned code (use '>' as a marker for calculated variables)
- added absolute residual and error
comment:6 in reply to: ↑ 4 Changed 8 years ago by gkronber
Replying to abeham:
Having such a view for any tabular data would be really awesome. Maybe something like a dataframe in R? The IDataset is already pretty close to a dataframe. And the Dataset could easily be moved to HeuristicLab.Data where it would be more generally usable as a data structure for tabular data. Row names would probably still be nice to have...
Related to efforts in #2726?
comment:7 Changed 8 years ago by abeham
#2726 is based on IndexedDataTable showing algorithm performance over evaluations as a line chart. I don't think it is related.
comment:8 Changed 7 years ago by gkronber
- Status changed from new to accepted
comment:9 Changed 7 years ago by gkronber
r15024: hide 'constant variables' (only one distinct value)
comment:10 Changed 7 years ago by gkronber
I have used and tested this extensively in the last few weeks and found it really helpful to find systematic errors in models.
comment:11 Changed 7 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from accepted to reviewing
comment:12 Changed 7 years ago by mkommend
r15088: Added target as calculated feature to ResidualAnalysisView.
comment:13 Changed 7 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from reviewing to readytorelease
comment:14 Changed 7 years ago by mkommend
It would be cool to have the solution evaluation views semantically sorted.
comment:15 Changed 7 years ago by mkommend
- Owner changed from gkronber to mkommend
- Status changed from readytorelease to reviewing
comment:16 Changed 7 years ago by mkommend
- Owner changed from mkommend to gkronber
r15094: Implemented necessary methods in the Dataset and ResidualAnalysisView to handle dates correctly.
comment:17 Changed 7 years ago by gkronber
- Status changed from reviewing to readytorelease
comment:18 Changed 7 years ago by gkronber
comment:19 Changed 7 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
r14889: added a solution view which uses the bubble chart for interactive visualization of model residuals. (HACK)
Also made small modifications to the bubble chart.