Opened 2 weeks ago

Last modified 2 weeks ago

#3115 reviewing defect

Method IsSolutionCompatible in PartialDependencePlotView is slow for datasets with string variables

Reported by: gkronber Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Problems.DataAnalysis.Symbolic.Regression.Views Version: trunk
Keywords: Cc:

Description

Inefficient code leads to blocking GUI.

Steps to reproduce:

  1. load a dataset with at least one string variable
  2. create two independent regression solutions
  3. open the partial dependence plot for one solution and draw the other solution on top of it

The relevant code is:

    private static bool SolutionsCompatible(IEnumerable<IRegressionSolution> solutions) {
      var refSolution = solutions.First();
      var refSolVars = refSolution.ProblemData.Dataset.VariableNames;
      foreach (var solution in solutions.Skip(1)) {
        var variables1 = solution.ProblemData.Dataset.VariableNames;
        if (!variables1.All(refSolVars.Contains))
          return false;

        foreach (var factorVar in variables1.Where(solution.ProblemData.Dataset.VariableHasType<string>)) {
          var distinctVals = refSolution.ProblemData.Dataset.GetStringValues(factorVar).Distinct();
          if (solution.ProblemData.Dataset.GetStringValues(factorVar).Any(val => !distinctVals.Contains(val))) return false;
        }
      }
      return true;
    }

Change History (3)

comment:1 Changed 2 weeks ago by gkronber

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.17
  • Owner set to gkronber
  • Status changed from new to accepted
  • Version set to trunk

comment:2 Changed 2 weeks ago by gkronber

r17920: cache calculations to improve speed when plotting multiple solutions in a partial dependence plot with factor variables.

comment:3 Changed 2 weeks ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing
Note: See TracTickets for help on using tickets.