Free cookie consent management tool by TermsFeed Policy Generator

Opened 10 years ago

Closed 7 years ago

#2383 closed defect (done)

Some charts for regression solutions throw an exception if the target variable contains missing values

Reported by: gkronber Owned by: gkronber
Priority: high Milestone: HeuristicLab 3.3.16
Component: Problems.DataAnalysis.Views Version: 3.3.11
Keywords: Cc:

Description (last modified by gkronber)

This could be the case e.g. when only a part of the whole data set is used for training/testing

NaN is not handled correctly by:

  • Scatter Plot
  • Error Characteristics Curve
  • Residual Histogram

Inf. is not handled correctly by:

  • Scatter Plot
  • Error Characteristics Curve
  • Residual Histogram
  • Line chart
  • Residuals line chart

Attachments (1)

DoubleExperiment.7z (7.6 KB) - added by fholzing 7 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 10 years ago by ascheibe

  • Owner set to gkronber
  • Status changed from new to assigned

comment:2 Changed 9 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.12 to HeuristicLab 3.3.13

comment:3 Changed 9 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.13 to HeuristicLab 4.0.x Backlog

comment:4 Changed 7 years ago by gkronber

  • Milestone changed from HeuristicLab 4.0 to HeuristicLab 4.x Backlog

comment:5 Changed 7 years ago by gkronber

  • Priority changed from medium to high

Test script:

using System;
using System.Linq;
using System.Collections.Generic;                            
using System.Collections;

using HeuristicLab.Core;
using HeuristicLab.Common;
using HeuristicLab.Problems.DataAnalysis;
using HeuristicLab.Algorithms.DataAnalysis;

public class MyScript : HeuristicLab.Scripting.CSharpScriptBase {
  public override void Main() {
    var rand = new Random(1234);
    var xs = Enumerable.Range(1, 100).Select(xi=> (double)xi).ToList();
    var ys = xs.Select(_ => rand.NextDouble()).ToList();
    ys[5] = double.NaN;          
    
    var ds = new Dataset(new string[] {"x", "y"}, new IList[] {xs, ys} );
    vars["ds"] = ds;
    var probData = new RegressionProblemData(ds, new string[] {"x"}, "y");
    probData.TrainingPartition.Start = 10;
    probData.TestPartition.End = 90;
    
    double rmsError, cvRmsError;
    var sol = LinearRegression.CreateLinearRegressionSolution(probData, out rmsError, out cvRmsError);
    vars["sol"] = sol;
    
  }
}

comment:6 Changed 7 years ago by gkronber

  • Description modified (diff)
  • Summary changed from Line chart for regression solutions throws an exception if the target variable contains missing values to Some charts for regression solutions throw an exception if the target variable contains missing values

This has been fixed for the line chart. However, the problem still remains for the scatter plot and the error characteristics curve.

For the line chart an exception is thrown if the dataset contains +inf. or -inf.

comment:7 Changed 7 years ago by gkronber

  • Milestone changed from HeuristicLab 4.x Backlog to HeuristicLab 3.3.16
  • Owner changed from gkronber to fholzing

fholzing, could you please fix this in the above mentioned views?

comment:8 Changed 7 years ago by fholzing

  • Status changed from assigned to accepted

comment:9 Changed 7 years ago by fholzing

r15785: Added additional filtering logic for double.NaN and double.Inf

comment:10 Changed 7 years ago by fholzing

r15787: Made the filtering a little bit more robust

comment:11 Changed 7 years ago by fholzing

  • Owner changed from fholzing to gkronber
  • Status changed from accepted to reviewing

comment:12 Changed 7 years ago by gkronber

I tested with the script above and found that the line chart and the residuals line chart works now.

However, the solution is incomplete as the other charts mentioned in the ticket description still produce exceptions. Steps to reproduce:

  • Open HL -> New C# Script -> Enter code above, compile and run.
  • Open the solution produced by the script
  • case (1) Select 'Scatter plot' -> an exception occurs -> expected behavior a scatter plot is shown without the points which are NaN or infinity.
  • case (2) Select 'Error Characteristics curve' and select 'All samples' in the 'Samples' drop-down. -> an exception occurs -> expected behavior an ERC is shown whereby the data points which are NaN or infinity are not considered in the calculation
  • case (3) Select 'Residual Histogram' -> an exception occurs -> expected behavior: as above

Notes from review of r15785 and r15787:

Line 175 in RegressionSolutionLineChartViewBase.cs

double targetValuesRange = targetValues.Where(v => !double.IsInfinity(v)).Max() - targetValues.Where(v => !double.IsNaN(v) && !double.IsNegativeInfinity(v)).Min();

should also test against NaN in when calculating Max(). The code seems to rely on the fact that comparison of any double with double.NaN is always false. It seems especially smart to skip the NaN comparison but I guess it introduces bugs. So, I strongly recommend to explicitly compare to NaN to make the code more obvious.

(A comment: I wonder if it would work if we remove the NaN comparison when calculating Min()? Another question is whether targetValues.Where(v => !double.IsInfinity(v)).Max() produces the same output for {double.NaN, 1.0, 2.0} and {1.0, double.NaN, 2.0}).

comment:13 Changed 7 years ago by gkronber

  • Owner changed from gkronber to fholzing
  • Status changed from reviewing to assigned

comment:14 Changed 7 years ago by fholzing

r15789: Added additional filter for Error Characteristics Curve

Last edited 7 years ago by fholzing (previous) (diff)

comment:15 Changed 7 years ago by fholzing

r15790: Added additional NaN-check and also check for both Infinities (+,-) for RegressionSolutionLineChartViewBase

Changed 7 years ago by fholzing

comment:16 Changed 7 years ago by fholzing

I added a simple project which performs basic mathematical operations on Inf and Nan, you should find your questions answered.

Basically: Every Operation (+,-,*,/) of any number (including both Inf) with NaN results in NaN The order is the following: NaN, Inf-, <numbers in their natural order>, Inf+ So .Min will result in NaN and .Max in Inf+ (if they occur), otherwise the next/previous element in the order will be taken.

comment:17 Changed 7 years ago by fholzing

I can't reproduce case (1) and case(3), case (2) should be fixed. I followed your instruction, but wasn't able to provoke an exception (also played around with the script, like changing NaN to both kinds of Infinity). Please check if you have the latest version.

comment:18 Changed 7 years ago by fholzing

  • Owner changed from fholzing to gkronber
  • Status changed from assigned to reviewing

comment:19 Changed 7 years ago by gkronber

Sorry, this was indeed my fault. My working copy was still switched to a branch. I tested the charts and have not found an error.

comment:20 Changed 7 years ago by fholzing

  • Owner changed from gkronber to fholzing
  • Status changed from reviewing to assigned

comment:21 Changed 7 years ago by gkronber

Florian, if you are ready please assign it back to me for the final review and we can move it to ready-to-release.

comment:22 Changed 7 years ago by fholzing

Additional note from mkommend: Currently the LineCharts just "Hide" Nan/Infs, therefore no gap is visible -> this should be changed.

comment:23 Changed 7 years ago by fholzing

r15793: LineCharts will now show a gap instead of hiding NaN/Inf

comment:24 Changed 7 years ago by fholzing

  • Owner changed from fholzing to gkronber
  • Status changed from assigned to reviewing

comment:25 Changed 7 years ago by gkronber

r15810: made some changes while reviewing.

Beware list.Remove(elem) removes only the first occurrence of elem.

comment:26 Changed 7 years ago by gkronber

  • Status changed from reviewing to readytorelease

comment:27 Changed 7 years ago by gkronber

r15811: merged r15785,r15787,r15789,r15790,r15793,r15810 from trunk to stable.

fholzing, thanks for fixing this!

comment:28 Changed 7 years ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.