Opened 4 years ago
Closed 3 years ago
#3115 closed defect (done)
Method IsSolutionCompatible in PartialDependencePlotView is slow for datasets with string variables
Reported by: | gkronber | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.17 |
Component: | Problems.DataAnalysis.Symbolic.Regression.Views | Version: | trunk |
Keywords: | Cc: |
Description
Inefficient code leads to blocking GUI.
Steps to reproduce:
- load a dataset with at least one string variable
- create two independent regression solutions
- open the partial dependence plot for one solution and draw the other solution on top of it
The relevant code is:
private static bool SolutionsCompatible(IEnumerable<IRegressionSolution> solutions) { var refSolution = solutions.First(); var refSolVars = refSolution.ProblemData.Dataset.VariableNames; foreach (var solution in solutions.Skip(1)) { var variables1 = solution.ProblemData.Dataset.VariableNames; if (!variables1.All(refSolVars.Contains)) return false; foreach (var factorVar in variables1.Where(solution.ProblemData.Dataset.VariableHasType<string>)) { var distinctVals = refSolution.ProblemData.Dataset.GetStringValues(factorVar).Distinct(); if (solution.ProblemData.Dataset.GetStringValues(factorVar).Any(val => !distinctVals.Contains(val))) return false; } } return true; }
Change History (9)
comment:1 Changed 4 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.17
- Owner set to gkronber
- Status changed from new to accepted
- Version set to trunk
comment:2 Changed 4 years ago by gkronber
comment:3 Changed 4 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from accepted to reviewing
comment:4 Changed 4 years ago by mkommend
r17938: Corrected typo in RegressionSolutoinPDPView.
comment:5 Changed 4 years ago by mkommend
- Owner changed from mkommend to gkronber
r17939: Simplified source code for checking compatibility of solutions by using Hashset and IsSubset method in PDP controls.
comment:6 Changed 3 years ago by gkronber
Reviewed r17938 and 17939
Version 0, edited 3 years ago
by gkronber
(next)
comment:7 Changed 3 years ago by gkronber
- Status changed from reviewing to readytorelease
I tested the implementation and could not reproduce the performance problem anymore.
comment:9 Changed 3 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
Note: See
TracTickets for help on using
tickets.
r17920: cache calculations to improve speed when plotting multiple solutions in a partial dependence plot with factor variables.