Opened 13 years ago
Closed 12 years ago
#1695 closed enhancement (done)
Include only the name of operators in a run
Reported by: | abeham | Owned by: | abeham |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.7 |
Component: | Optimization | Version: | 3.3.7 |
Keywords: | Cc: |
Description
When a run gets created the results are included as well as the parameters of an algorithm and a problem. If such a parameter happens to be an operator the operator is included in that run as a clone.
I think it should be enough if we only include the operator name. For analysis purposes in our RunCollectionViews only the name of the operator is important.
Is there a scenario where we need to store the operator as a full clone of its instance?
For a GA with 1000 Generations I compared the RunCollection when storing 100 runs (same run cloned 100 times) once with operator clones and once without. There is a reduction of 28% in compressed file size and 60% in uncompressed size, or in absolute terms from uncompressed 16.0 MB down to 6.3 MB. With the operators stored only as strings this number rises a little of course.
Change History (11)
comment:1 Changed 13 years ago by abeham
comment:2 Changed 13 years ago by swagner
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.6
comment:3 Changed 13 years ago by abeham
Unfortunately it's not as easy to implement as I thought. CollectParameterValues is implemented in ParameterizedNamedItem which resides in HeuristicLab.Core. To add the name of an operator instead of the whole operator would require to create a new StringValue with the name, which in turn is a type of HeuristicLab.Data.
We could
- reimplement the functionality of CollectParameterValues anew in Algorithm, Problem and Operator and possible other types that are ParameterizedNamedItems and that can contain operators.
- add a method IItem GetCollectedValue() to IParameterizedItem or even IItem which returns this by default and which is overriden in Operator to return new StringValue(Name)
- delay the ticket and merge Core and Data
Other options?
comment:4 Changed 13 years ago by swagner
- Milestone changed from HeuristicLab 3.3.6 to HeuristicLab 3.3.7
Thanks abeham for your thoughts on this issue. We should spend some more time on discussing how to implement this. Therefore I move this ticket to 3.3.7 for now.
comment:5 Changed 13 years ago by abeham
- Owner changed from swagner to abeham
- Status changed from new to accepted
comment:6 Changed 13 years ago by abeham
r7579: I solved this issue now. I found that CollectParameterValues was too monolithic in that you have to overwrite and re-implement the same method again if you wanted to change just a detail (in that case that operators are stored by their name). So I split CollectParameterValues into two separate logical parts:
- CollectParameterValues is iterating over the parameters
- GetCollectedValues decides what values are collected from the given parameter value
Algorithm and Problem now overwrite only GetCollectedValues, but reuse the implementation of the base class in that they only filter the values. When they see an IOperator they will instead convert it to its name. Using IEnumerable and yield I think that's a nice solution.
comment:7 Changed 13 years ago by abeham
- Owner changed from abeham to mkommend
- Status changed from accepted to reviewing
Please forward this to swagner when you've found no further issues.
comment:8 Changed 13 years ago by abeham
- Owner changed from mkommend to abeham
- Status changed from reviewing to assigned
I found a bug where parameters of the operators were not added to the run if the operator itself was not added to the run.
comment:9 Changed 13 years ago by abeham
- Owner changed from abeham to mkommend
- Status changed from assigned to reviewing
r7706: fixed bug in parameter collection
comment:10 Changed 13 years ago by mkommend
- Owner changed from mkommend to abeham
- Status changed from reviewing to readytorelease
Thanks for implementing this enhancement.
comment:11 Changed 12 years ago by mkommend
- Resolution set to done
- Status changed from readytorelease to closed
- Version changed from 3.3.5 to 3.3.7
I looked a bit deeper into the issue examining the size of the serialized xml of a single run.
I took one run of the GA-TSP sample where Coordinates, DistanceMatrix, and BestSolution have not been collected. The run is copied to a run collection and saved. Then the operators are removed by a runcollection modifier, the modifier is removed and the run is saved again. These two are then compared:
However, the contribution to file size varies greatly between the operators, the range was [40;120] for the non-analyzer operators. A result of a single string uses just 4 lines, so there's quite some saving potential. I did not look into time differences.