Free cookie consent management tool by TermsFeed Policy Generator

Opened 12 years ago

Closed 11 years ago

#2055 closed feature request (done)

Tool to reduce the file size of data analysis experiments

Reported by: mkommend Owned by: ascheibe
Priority: medium Milestone: HeuristicLab 3.3.9
Component: Tools Version: 3.3.8
Keywords: Cc:

Description

Files storing data analysis experiments executed on the Hive grow rather large, because the dataset is saved multiple times in it. Therefore, a small utility tool which removes duplicate datasets from saved HL files would be nice to have.

Change History (19)

comment:1 Changed 12 years ago by mkommend

  • Status changed from new to accepted

comment:2 Changed 12 years ago by mkommend

r9497: Added first version of HL.FileShrinker.

comment:3 Changed 12 years ago by mkommend

Performed first tests of the file shrinker on my Eurocast and GECCO experiments. It took for each folder around 30 minutes to be processed and the folder size got reduced from 2.16 GB to 141 MB (Eurocast) and from 1.34 GB to 123 MB (GECCO).

comment:4 Changed 12 years ago by abeham

  • Owner changed from mkommend to architects
  • Status changed from accepted to assigned

We could think about expanding our file format to include separate "input files" that may be linked from the main serialization file.

comment:5 Changed 11 years ago by gkronber

  • Owner changed from architects to mkommend

A tools menu item should be added to the optimizer, which calls the functionality provided by the command line program.

comment:6 Changed 11 years ago by mkommend

  • Milestone set to HeuristicLab 3.3.9

comment:7 Changed 11 years ago by abeham

  • Version changed from 3.3.8 to branch

comment:8 Changed 11 years ago by mkommend

  • Status changed from assigned to accepted

comment:9 Changed 11 years ago by mkommend

  • Version changed from branch to 3.3.8

comment:10 Changed 11 years ago by mkommend

r9859: Added menuitem that removes duplicate datasets.

comment:11 Changed 11 years ago by mkommend

r9860: Added new menu item for data analysis commands.

comment:12 Changed 11 years ago by mkommend

r9861: Removed fileshrinker from tools directory.

comment:13 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing

comment:14 Changed 11 years ago by gkronber

I reviewed the code of the ShrinkDataAnalysisRunsMenuItem but didn't understand why it is necessary to create the variableValuesGetter and variableValuesSetter via Expressions. Seemingly, this allows to set also private fields?

Please add a comment explaining what you are doing in the static initializer and why this is necessary.

Please also add a comment that you are comparing variable names for line 106 if(!values1.Keys.SequenceEqual(values2.Keys)) return false;

comment:15 Changed 11 years ago by mkommend

r9866: Added comments to ShrinkDataAnalysisRunsMenuItem.

comment:16 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to readytorelease

comment:17 Changed 11 years ago by ascheibe

  • Owner changed from mkommend to ascheibe
  • Status changed from readytorelease to reviewing

comment:18 Changed 11 years ago by ascheibe

  • Status changed from reviewing to readytorelease

comment:19 Changed 11 years ago by ascheibe

  • Resolution set to done
  • Status changed from readytorelease to closed

r9932 merged r9859, r9860, r9866 into stable branch

Note: See TracTickets for help on using tickets.