Opened 7 months ago

Closed 6 weeks ago

#3115 closed defect (done)

Method IsSolutionCompatible in PartialDependencePlotView is slow for datasets with string variables

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Problems.DataAnalysis.Symbolic.Regression.Views Version: trunk
Keywords: Cc:

Description

Inefficient code leads to blocking GUI.

Steps to reproduce:

  1. load a dataset with at least one string variable
  2. create two independent regression solutions
  3. open the partial dependence plot for one solution and draw the other solution on top of it

The relevant code is:

    private static bool SolutionsCompatible(IEnumerable<IRegressionSolution> solutions) {
      var refSolution = solutions.First();
      var refSolVars = refSolution.ProblemData.Dataset.VariableNames;
      foreach (var solution in solutions.Skip(1)) {
        var variables1 = solution.ProblemData.Dataset.VariableNames;
        if (!variables1.All(refSolVars.Contains))
          return false;

        foreach (var factorVar in variables1.Where(solution.ProblemData.Dataset.VariableHasType<string>)) {
          var distinctVals = refSolution.ProblemData.Dataset.GetStringValues(factorVar).Distinct();
          if (solution.ProblemData.Dataset.GetStringValues(factorVar).Any(val => !distinctVals.Contains(val))) return false;
        }
      }
      return true;
    }

Change History (9)

comment:1 Changed 7 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.17
  • Owner set to gkronber
  • Status changed from new to accepted
  • Version set to trunk

comment:2 Changed 7 months ago by gkronber

r17920: cache calculations to improve speed when plotting multiple solutions in a partial dependence plot with factor variables.

comment:3 Changed 7 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:4 Changed 6 months ago by mkommend

r17938: Corrected typo in RegressionSolutoinPDPView.

comment:5 Changed 6 months ago by mkommend

  • Owner changed from mkommend to gkronber

r17939: Simplified source code for checking compatibility of solutions by using Hashset and IsSubset method in PDP controls.

comment:6 Changed 3 months ago by gkronber

Reviewed r17938 and r17939

Last edited 3 months ago by gkronber (previous) (diff)

comment:7 Changed 3 months ago by gkronber

  • Status changed from reviewing to readytorelease

I tested the implementation and could not reproduce the performance problem anymore.

comment:8 Changed 3 months ago by gkronber

r18021: merged r17920,r17938,r17939 from trunk to stable

comment:9 Changed 6 weeks ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.