Opened 5 years ago

Closed 5 years ago

#1695 closed enhancement (done)

Include only the name of operators in a run

Reported by: abeham Owned by: abeham
Priority: medium Milestone: HeuristicLab 3.3.7
Component: Optimization Version: 3.3.7
Keywords: Cc:


When a run gets created the results are included as well as the parameters of an algorithm and a problem. If such a parameter happens to be an operator the operator is included in that run as a clone.

I think it should be enough if we only include the operator name. For analysis purposes in our RunCollectionViews only the name of the operator is important.

Is there a scenario where we need to store the operator as a full clone of its instance?

For a GA with 1000 Generations I compared the RunCollection when storing 100 runs (same run cloned 100 times) once with operator clones and once without. There is a reduction of 28% in compressed file size and 60% in uncompressed size, or in absolute terms from uncompressed 16.0 MB down to 6.3 MB. With the operators stored only as strings this number rises a little of course.

Change History (11)

comment:1 Changed 5 years ago by abeham

I looked a bit deeper into the issue examining the size of the serialized xml of a single run.

I took one run of the GA-TSP sample where Coordinates, DistanceMatrix, and BestSolution have not been collected. The run is copied to a run collection and saved. Then the operators are removed by a runcollection modifier, the modifier is removed and the run is saved again. These two are then compared:

  • Including the operators has a total line count of 2029 compared to just 636 when operators are removed
  • The Analyzer operator needs 1080 lines which is 53% of the whole file
    • in the Analyzer the biggest part is the BestAverageWorstQualityAnalyzer which has 32% of all the lines
  • Selector, Crossover, Manipulator, SolutionCreator, Evaluator take together 17% of all lines
  • So operators are contributing about 70% to the serialized xml size of this single run

However, the contribution to file size varies greatly between the operators, the range was [40;120] for the non-analyzer operators. A result of a single string uses just 4 lines, so there's quite some saving potential. I did not look into time differences.

comment:2 Changed 5 years ago by swagner

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.6

comment:3 Changed 5 years ago by abeham

Unfortunately it's not as easy to implement as I thought. CollectParameterValues is implemented in ParameterizedNamedItem which resides in HeuristicLab.Core. To add the name of an operator instead of the whole operator would require to create a new StringValue with the name, which in turn is a type of HeuristicLab.Data.

We could

  • reimplement the functionality of CollectParameterValues anew in Algorithm, Problem and Operator and possible other types that are ParameterizedNamedItems and that can contain operators.
  • add a method IItem GetCollectedValue() to IParameterizedItem or even IItem which returns this by default and which is overriden in Operator to return new StringValue(Name)
  • delay the ticket and merge Core and Data

Other options?

comment:4 Changed 5 years ago by swagner

  • Milestone changed from HeuristicLab 3.3.6 to HeuristicLab 3.3.7

Thanks abeham for your thoughts on this issue. We should spend some more time on discussing how to implement this. Therefore I move this ticket to 3.3.7 for now.

comment:5 Changed 5 years ago by abeham

  • Owner changed from swagner to abeham
  • Status changed from new to accepted

comment:6 Changed 5 years ago by abeham

r7579: I solved this issue now. I found that CollectParameterValues was too monolithic in that you have to overwrite and re-implement the same method again if you wanted to change just a detail (in that case that operators are stored by their name). So I split CollectParameterValues into two separate logical parts:

  • CollectParameterValues is iterating over the parameters
  • GetCollectedValues decides what values are collected from the given parameter value

Algorithm and Problem now overwrite only GetCollectedValues, but reuse the implementation of the base class in that they only filter the values. When they see an IOperator they will instead convert it to its name. Using IEnumerable and yield I think that's a nice solution.

comment:7 Changed 5 years ago by abeham

  • Owner changed from abeham to mkommend
  • Status changed from accepted to reviewing

Please forward this to swagner when you've found no further issues.

comment:8 Changed 5 years ago by abeham

  • Owner changed from mkommend to abeham
  • Status changed from reviewing to assigned

I found a bug where parameters of the operators were not added to the run if the operator itself was not added to the run.

comment:9 Changed 5 years ago by abeham

  • Owner changed from abeham to mkommend
  • Status changed from assigned to reviewing

r7706: fixed bug in parameter collection

comment:10 Changed 5 years ago by mkommend

  • Owner changed from mkommend to abeham
  • Status changed from reviewing to readytorelease

Thanks for implementing this enhancement.

comment:11 Changed 5 years ago by mkommend

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.5 to 3.3.7
Note: See TracTickets for help on using tickets.