Opened 4 years ago

Closed 4 years ago

#2055 closed feature request (done)

Tool to reduce the file size of data analysis experiments

Reported by: mkommend Owned by: ascheibe
Priority: medium Milestone: HeuristicLab 3.3.9
Component: Tools Version: 3.3.8
Keywords: Cc:

Description

Files storing data analysis experiments executed on the Hive grow rather large, because the dataset is saved multiple times in it. Therefore, a small utility tool which removes duplicate datasets from saved HL files would be nice to have.

Change History (19)

comment:1 Changed 4 years ago by mkommend

  • Status changed from new to accepted

comment:2 Changed 4 years ago by mkommend

r9497: Added first version of HL.FileShrinker.

comment:3 Changed 4 years ago by mkommend

Performed first tests of the file shrinker on my Eurocast and GECCO experiments. It took for each folder around 30 minutes to be processed and the folder size got reduced from 2.16 GB to 141 MB (Eurocast) and from 1.34 GB to 123 MB (GECCO).

comment:4 Changed 4 years ago by abeham

  • Owner changed from mkommend to architects
  • Status changed from accepted to assigned

We could think about expanding our file format to include separate "input files" that may be linked from the main serialization file.

comment:5 Changed 4 years ago by gkronber

  • Owner changed from architects to mkommend

A tools menu item should be added to the optimizer, which calls the functionality provided by the command line program.

comment:6 Changed 4 years ago by mkommend

  • Milestone set to HeuristicLab 3.3.9

comment:7 Changed 4 years ago by abeham

  • Version changed from 3.3.8 to branch

comment:8 Changed 4 years ago by mkommend

  • Status changed from assigned to accepted

comment:9 Changed 4 years ago by mkommend

  • Version changed from branch to 3.3.8

comment:10 Changed 4 years ago by mkommend

r9859: Added menuitem that removes duplicate datasets.

comment:11 Changed 4 years ago by mkommend

r9860: Added new menu item for data analysis commands.

comment:12 Changed 4 years ago by mkommend

r9861: Removed fileshrinker from tools directory.

comment:13 Changed 4 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing

comment:14 Changed 4 years ago by gkronber

I reviewed the code of the ShrinkDataAnalysisRunsMenuItem but didn't understand why it is necessary to create the variableValuesGetter and variableValuesSetter via Expressions. Seemingly, this allows to set also private fields?

Please add a comment explaining what you are doing in the static initializer and why this is necessary.

Please also add a comment that you are comparing variable names for line 106 if(!values1.Keys.SequenceEqual(values2.Keys)) return false;

comment:15 Changed 4 years ago by mkommend

r9866: Added comments to ShrinkDataAnalysisRunsMenuItem.

comment:16 Changed 4 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to readytorelease

comment:17 Changed 4 years ago by ascheibe

  • Owner changed from mkommend to ascheibe
  • Status changed from readytorelease to reviewing

comment:18 Changed 4 years ago by ascheibe

  • Status changed from reviewing to readytorelease

comment:19 Changed 4 years ago by ascheibe

  • Resolution set to done
  • Status changed from readytorelease to closed

r9932 merged r9859, r9860, r9866 into stable branch

Note: See TracTickets for help on using tickets.