#2779 closed feature request (done)

For model validation and inspection an analysis of residuals over input variables could be insightful

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Problems.DataAnalysis.Views Version: 3.3.14
Keywords: Cc:

Description

In model validation we should check if the distribution of residuals is independent of the inputs and the target variable. If patterns are visible in the distribution of residuals this is a hint that the model does not fit the available data well.

We already have the nice bubble chart for analysis of experiments but it works only for runs and not for any kind of tabular data. However, the bubble chart easily handles 10.000 runs so it could potentially be used for this purpose as well.

Change History (19)

comment:1 Changed 17 months ago by gkronber

r14889: added a solution view which uses the bubble chart for interactive visualization of model residuals. (HACK)

Also made small modifications to the bubble chart.

  • layout change of controls in the bottom left to save space
  • better sorting of entries in the comboboxes using the NaturalStringComparer
  • Properties to get and set the currently selected xAxis and yAxis values.
Last edited 17 months ago by gkronber (previous) (diff)

comment:2 Changed 17 months ago by gkronber

r14890: removed reference to resx file

comment:3 Changed 17 months ago by gkronber

TODO:

  • Clean up code
  • Add absolute rel. error and abs. error
  • Maybe: restrict number of entries in xAxis dropdown box
    • decided against this as I found that it is convenient to color rows by residuals and then show a scatter plot of two input variables.
  • Move calculated entries (error, prediction, ...) to the top of the dropdown box
  • remove entries in the dropdown box which are constant
Last edited 16 months ago by gkronber (previous) (diff)

comment:4 follow-up: Changed 17 months ago by abeham

Having such a view for any tabular data would be really awesome. Maybe something like a dataframe in R? The IDataset is already pretty close to a dataframe. And the Dataset could easily be moved to HeuristicLab.Data where it would be more generally usable as a data structure for tabular data. Row names would probably still be nice to have...

comment:5 Changed 17 months ago by gkronber

r14943:

  • cleaned code (use '>' as a marker for calculated variables)
  • added absolute residual and error

comment:6 in reply to: ↑ 4 Changed 17 months ago by gkronber

Replying to abeham:

Having such a view for any tabular data would be really awesome. Maybe something like a dataframe in R? The IDataset is already pretty close to a dataframe. And the Dataset could easily be moved to HeuristicLab.Data where it would be more generally usable as a data structure for tabular data. Row names would probably still be nice to have...

Related to efforts in #2726?

comment:7 Changed 17 months ago by abeham

#2726 is based on IndexedDataTable showing algorithm performance over evaluations as a line chart. I don't think it is related.

comment:8 Changed 16 months ago by gkronber

  • Status changed from new to accepted

comment:9 Changed 16 months ago by gkronber

r15024: hide 'constant variables' (only one distinct value)

comment:10 Changed 16 months ago by gkronber

I have used and tested this extensively in the last few weeks and found it really helpful to find systematic errors in models.

Last edited 16 months ago by gkronber (previous) (diff)

comment:11 Changed 16 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:12 Changed 15 months ago by mkommend

r15088: Added target as calculated feature to ResidualAnalysisView.

comment:13 Changed 15 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed all changes in this ticket (r14889, r14890, r14943, r15024, r15088).

comment:14 Changed 15 months ago by mkommend

It would be cool to have the solution evaluation views semantically sorted.

comment:15 Changed 15 months ago by mkommend

  • Owner changed from gkronber to mkommend
  • Status changed from readytorelease to reviewing

comment:16 Changed 15 months ago by mkommend

  • Owner changed from mkommend to gkronber

r15094: Implemented necessary methods in the Dataset and ResidualAnalysisView to handle dates correctly.

comment:17 Changed 15 months ago by gkronber

  • Status changed from reviewing to readytorelease

Reviewed r15088 and r15094. Thanks for the corresponding changes in IDataset and Dataset

comment:18 Changed 15 months ago by gkronber

r15161: merged r14889,r14890,r14943,r15024,r15088,r15094 from trunk to stable (all changesets merged)

comment:19 Changed 15 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.