Opened 6 months ago

Closed 3 months ago

#2779 closed feature request (done)

For model validation and inspection an analysis of residuals over input variables could be insightful

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Problems.DataAnalysis.Views Version: 3.3.14
Keywords: Cc:

Description

In model validation we should check if the distribution of residuals is independent of the inputs and the target variable. If patterns are visible in the distribution of residuals this is a hint that the model does not fit the available data well.

We already have the nice bubble chart for analysis of experiments but it works only for runs and not for any kind of tabular data. However, the bubble chart easily handles 10.000 runs so it could potentially be used for this purpose as well.

Change History (19)

comment:1 Changed 6 months ago by gkronber

r14889: added a solution view which uses the bubble chart for interactive visualization of model residuals. (HACK)

Also made small modifications to the bubble chart.

  • layout change of controls in the bottom left to save space
  • better sorting of entries in the comboboxes using the NaturalStringComparer
  • Properties to get and set the currently selected xAxis and yAxis values.
Last edited 6 months ago by gkronber (previous) (diff)

comment:2 Changed 6 months ago by gkronber

r14890: removed reference to resx file

comment:3 Changed 6 months ago by gkronber

TODO:

  • Clean up code
  • Add absolute rel. error and abs. error
  • Maybe: restrict number of entries in xAxis dropdown box
    • decided against this as I found that it is convenient to color rows by residuals and then show a scatter plot of two input variables.
  • Move calculated entries (error, prediction, ...) to the top of the dropdown box
  • remove entries in the dropdown box which are constant
Last edited 5 months ago by gkronber (previous) (diff)

comment:4 follow-up: Changed 6 months ago by abeham

Having such a view for any tabular data would be really awesome. Maybe something like a dataframe in R? The IDataset is already pretty close to a dataframe. And the Dataset could easily be moved to HeuristicLab.Data where it would be more generally usable as a data structure for tabular data. Row names would probably still be nice to have...

comment:5 Changed 5 months ago by gkronber

r14943:

  • cleaned code (use '>' as a marker for calculated variables)
  • added absolute residual and error

comment:6 in reply to: ↑ 4 Changed 5 months ago by gkronber

Replying to abeham:

Having such a view for any tabular data would be really awesome. Maybe something like a dataframe in R? The IDataset is already pretty close to a dataframe. And the Dataset could easily be moved to HeuristicLab.Data where it would be more generally usable as a data structure for tabular data. Row names would probably still be nice to have...

Related to efforts in #2726?

comment:7 Changed 5 months ago by abeham

#2726 is based on IndexedDataTable showing algorithm performance over evaluations as a line chart. I don't think it is related.

comment:8 Changed 5 months ago by gkronber

  • Status changed from new to accepted

comment:9 Changed 5 months ago by gkronber

r15024: hide 'constant variables' (only one distinct value)

comment:10 Changed 5 months ago by gkronber

I have used and tested this extensively in the last few weeks and found it really helpful to find systematic errors in models.

Last edited 5 months ago by gkronber (previous) (diff)

comment:11 Changed 5 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:12 Changed 4 months ago by mkommend

r15088: Added target as calculated feature to ResidualAnalysisView.

comment:13 Changed 4 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed all changes in this ticket (r14889, r14890, r14943, r15024, r15088).

comment:14 Changed 4 months ago by mkommend

It would be cool to have the solution evaluation views semantically sorted.

comment:15 Changed 4 months ago by mkommend

  • Owner changed from gkronber to mkommend
  • Status changed from readytorelease to reviewing

comment:16 Changed 4 months ago by mkommend

  • Owner changed from mkommend to gkronber

r15094: Implemented necessary methods in the Dataset and ResidualAnalysisView to handle dates correctly.

comment:17 Changed 4 months ago by gkronber

  • Status changed from reviewing to readytorelease

Reviewed r15088 and r15094. Thanks for the corresponding changes in IDataset and Dataset

comment:18 Changed 3 months ago by gkronber

r15161: merged r14889,r14890,r14943,r15024,r15088,r15094 from trunk to stable (all changesets merged)

comment:19 Changed 3 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.