Free cookie consent management tool by TermsFeed Policy Generator

Opened 3 years ago

Closed 3 years ago

#3115 closed defect (done)

Method IsSolutionCompatible in PartialDependencePlotView is slow for datasets with string variables

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Problems.DataAnalysis.Symbolic.Regression.Views Version: trunk
Keywords: Cc:

Description

Inefficient code leads to blocking GUI.

Steps to reproduce:

  1. load a dataset with at least one string variable
  2. create two independent regression solutions
  3. open the partial dependence plot for one solution and draw the other solution on top of it

The relevant code is:

    private static bool SolutionsCompatible(IEnumerable<IRegressionSolution> solutions) {
      var refSolution = solutions.First();
      var refSolVars = refSolution.ProblemData.Dataset.VariableNames;
      foreach (var solution in solutions.Skip(1)) {
        var variables1 = solution.ProblemData.Dataset.VariableNames;
        if (!variables1.All(refSolVars.Contains))
          return false;

        foreach (var factorVar in variables1.Where(solution.ProblemData.Dataset.VariableHasType<string>)) {
          var distinctVals = refSolution.ProblemData.Dataset.GetStringValues(factorVar).Distinct();
          if (solution.ProblemData.Dataset.GetStringValues(factorVar).Any(val => !distinctVals.Contains(val))) return false;
        }
      }
      return true;
    }

Change History (9)

comment:1 Changed 3 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.17
  • Owner set to gkronber
  • Status changed from new to accepted
  • Version set to trunk

comment:2 Changed 3 years ago by gkronber

r17920: cache calculations to improve speed when plotting multiple solutions in a partial dependence plot with factor variables.

comment:3 Changed 3 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:4 Changed 3 years ago by mkommend

r17938: Corrected typo in RegressionSolutoinPDPView.

comment:5 Changed 3 years ago by mkommend

  • Owner changed from mkommend to gkronber

r17939: Simplified source code for checking compatibility of solutions by using Hashset and IsSubset method in PDP controls.

comment:6 Changed 3 years ago by gkronber

Reviewed r17938 and 17939

Version 0, edited 3 years ago by gkronber (next)

comment:7 Changed 3 years ago by gkronber

  • Status changed from reviewing to readytorelease

I tested the implementation and could not reproduce the performance problem anymore.

comment:8 Changed 3 years ago by gkronber

r18021: merged r17920,r17938,r17939 from trunk to stable

comment:9 Changed 3 years ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.