Opened 5 years ago
Closed 3 years ago
#2383 closed defect (done)
Some charts for regression solutions throw an exception if the target variable contains missing values
Reported by: | gkronber | Owned by: | gkronber |
---|---|---|---|
Priority: | high | Milestone: | HeuristicLab 3.3.16 |
Component: | Problems.DataAnalysis.Views | Version: | 3.3.11 |
Keywords: | Cc: |
Description (last modified by gkronber)
This could be the case e.g. when only a part of the whole data set is used for training/testing
NaN is not handled correctly by:
- Scatter Plot
- Error Characteristics Curve
- Residual Histogram
Inf. is not handled correctly by:
- Scatter Plot
- Error Characteristics Curve
- Residual Histogram
- Line chart
- Residuals line chart
Attachments (1)
Change History (29)
comment:1 Changed 5 years ago by ascheibe
- Owner set to gkronber
- Status changed from new to assigned
comment:2 Changed 5 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.12 to HeuristicLab 3.3.13
comment:3 Changed 5 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.13 to HeuristicLab 4.0.x Backlog
comment:4 Changed 3 years ago by gkronber
- Milestone changed from HeuristicLab 4.0 to HeuristicLab 4.x Backlog
comment:5 Changed 3 years ago by gkronber
- Priority changed from medium to high
comment:6 Changed 3 years ago by gkronber
- Description modified (diff)
- Summary changed from Line chart for regression solutions throws an exception if the target variable contains missing values to Some charts for regression solutions throw an exception if the target variable contains missing values
This has been fixed for the line chart. However, the problem still remains for the scatter plot and the error characteristics curve.
For the line chart an exception is thrown if the dataset contains +inf. or -inf.
comment:7 Changed 3 years ago by gkronber
- Milestone changed from HeuristicLab 4.x Backlog to HeuristicLab 3.3.16
- Owner changed from gkronber to fholzing
fholzing, could you please fix this in the above mentioned views?
comment:8 Changed 3 years ago by fholzing
- Status changed from assigned to accepted
comment:9 Changed 3 years ago by fholzing
r15785: Added additional filtering logic for double.NaN and double.Inf
comment:10 Changed 3 years ago by fholzing
r15787: Made the filtering a little bit more robust
comment:11 Changed 3 years ago by fholzing
- Owner changed from fholzing to gkronber
- Status changed from accepted to reviewing
comment:12 Changed 3 years ago by gkronber
I tested with the script above and found that the line chart and the residuals line chart works now.
However, the solution is incomplete as the other charts mentioned in the ticket description still produce exceptions. Steps to reproduce:
- Open HL -> New C# Script -> Enter code above, compile and run.
- Open the solution produced by the script
- case (1) Select 'Scatter plot' -> an exception occurs -> expected behavior a scatter plot is shown without the points which are NaN or infinity.
- case (2) Select 'Error Characteristics curve' and select 'All samples' in the 'Samples' drop-down. -> an exception occurs -> expected behavior an ERC is shown whereby the data points which are NaN or infinity are not considered in the calculation
- case (3) Select 'Residual Histogram' -> an exception occurs -> expected behavior: as above
Notes from review of r15785 and r15787:
Line 175 in RegressionSolutionLineChartViewBase.cs
double targetValuesRange = targetValues.Where(v => !double.IsInfinity(v)).Max() - targetValues.Where(v => !double.IsNaN(v) && !double.IsNegativeInfinity(v)).Min();
should also test against NaN in when calculating Max(). The code seems to rely on the fact that comparison of any double with double.NaN is always false. It seems especially smart to skip the NaN comparison but I guess it introduces bugs. So, I strongly recommend to explicitly compare to NaN to make the code more obvious.
(A comment: I wonder if it would work if we remove the NaN comparison when calculating Min()? Another question is whether targetValues.Where(v => !double.IsInfinity(v)).Max() produces the same output for {double.NaN, 1.0, 2.0} and {1.0, double.NaN, 2.0}).
comment:13 Changed 3 years ago by gkronber
- Owner changed from gkronber to fholzing
- Status changed from reviewing to assigned
comment:14 Changed 3 years ago by fholzing
r15789: Added additional filter for Error Characteristics Curve
comment:15 Changed 3 years ago by fholzing
r15790: Added additional NaN-check and also check for both Infinities (+,-) for RegressionSolutionLineChartViewBase
Changed 3 years ago by fholzing
comment:16 Changed 3 years ago by fholzing
I added a simple project which performs basic mathematical operations on Inf and Nan, you should find your questions answered.
Basically: Every Operation (+,-,*,/) of any number (including both Inf) with NaN results in NaN The order is the following: NaN, Inf-, <numbers in their natural order>, Inf+ So .Min will result in NaN and .Max in Inf+ (if they occur), otherwise the next/previous element in the order will be taken.
comment:17 Changed 3 years ago by fholzing
I can't reproduce case (1) and case(3), case (2) should be fixed. I followed your instruction, but wasn't able to provoke an exception (also played around with the script, like changing NaN to both kinds of Infinity). Please check if you have the latest version.
comment:18 Changed 3 years ago by fholzing
- Owner changed from fholzing to gkronber
- Status changed from assigned to reviewing
comment:19 Changed 3 years ago by gkronber
Sorry, this was indeed my fault. My working copy was still switched to a branch. I tested the charts and have not found an error.
comment:20 Changed 3 years ago by fholzing
- Owner changed from gkronber to fholzing
- Status changed from reviewing to assigned
comment:21 Changed 3 years ago by gkronber
Florian, if you are ready please assign it back to me for the final review and we can move it to ready-to-release.
comment:22 Changed 3 years ago by fholzing
Additional note from mkommend: Currently the LineCharts just "Hide" Nan/Infs, therefore no gap is visible -> this should be changed.
comment:23 Changed 3 years ago by fholzing
r15793: LineCharts will now show a gap instead of hiding NaN/Inf
comment:24 Changed 3 years ago by fholzing
- Owner changed from fholzing to gkronber
- Status changed from assigned to reviewing
comment:25 Changed 3 years ago by gkronber
r15810: made some changes while reviewing.
Beware list.Remove(elem) removes only the first occurrence of elem.
comment:26 Changed 3 years ago by gkronber
- Status changed from reviewing to readytorelease
comment:27 Changed 3 years ago by gkronber
comment:28 Changed 3 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
Test script: