Opened 11 months ago

Last modified 2 months ago

#2883 reviewing feature request

Option to store actual model instead of a surrogate-model for RegressionSolutions

Reported by: pfleck Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Algorithms.DataAnalysis Version: branch
Keywords: Cc:

Description

To save memory, a GBT model is not stored directly, but a surrogate-GBT model is stored that holds information to recreate the actual GBT model. However, recreating the GBT model is CPU-intensive. A similar method is also used for random forests.

Currently, we have the option to disable the solution-creation to avoid creating any models. An additional option would be useful, to store the actual model instead of the surrogate-model to avoid re-calculating. This way, the user can choose a trade-off between memory consumption and calculation-effort.

Change History (18)

comment:1 Changed 11 months ago by pfleck

  • Owner set to fholzing
  • Status changed from new to assigned
  • Type changed from defect to feature request

comment:2 Changed 11 months ago by fholzing

r15668: Created branch folder

comment:3 Changed 11 months ago by fholzing

  • Version set to branch

comment:4 Changed 11 months ago by fholzing

r15669: Created branch folder

comment:5 Changed 11 months ago by fholzing

r15670: Made the solution compile

comment:6 Changed 11 months ago by fholzing

r15675: Changed from Boolean to Enum

comment:7 Changed 11 months ago by fholzing

r15678: Implemented third option for complete storage

comment:8 Changed 11 months ago by fholzing

r15679: Removed backwardscompatibility and changed the level of the decision if surrogate or not into the algorithm (so one level up)

comment:9 Changed 11 months ago by fholzing

r15687: Adapted to new trunk-structure

comment:10 Changed 10 months ago by fholzing

r15732: Changed ToolTip with new recommendation (see #2890)

comment:11 Changed 10 months ago by fholzing

  • Owner changed from fholzing to mkommend
  • Status changed from assigned to reviewing

comment:12 Changed 8 months ago by mkommend

  • Owner changed from mkommend to pfleck

comment:13 Changed 3 months ago by pfleck

r16158 merged trunk

comment:14 Changed 3 months ago by pfleck

  • Owner changed from pfleck to fholzing
  • Status changed from reviewing to assigned

Functionality

  • Everything works fine: selecting Parameters keeps the file-size small but requires time to re-run the model; selecting Complete increases the filesize drastically but re-running the alg is not required.
  • Saving and loading a GBT Solution (Complete or Parameters where model is already created), causes HL to freeze for 2-3 seconds until the file-save/load progress shows.

Code

  • Please restore the original coding style (e.g. opening braces in same line in GradientBoostedTreesAlgorithm.cs, GradientBoostedTreesModel.cs and GradientBoostedTreesModelSurrogate.cs)
  • Put ModelStorage into namespace HeuristicLab.Algorithms.DataAnalysis.
  • Otherwise, the code looks good.

Discussion

  • I think the names for the ModelStorage-enum, its values and the CreateSolution Parameter/Setting should be changed. Currently, the concept of "what is created" and the concept of "how it is stored" is mixed. The property-name indicates that we control what results are created, and the property-type indicates that we control how the model is stored. We should find a concise way to address and name those concepts clearly.
  • The functionality implemented in this ticket would also very interesting for the RandomForest. Please open a separate ticket for this.

comment:15 Changed 2 months ago by fholzing

r16220: Changed formatting to adhere to the coding guidelines

comment:16 Changed 2 months ago by fholzing

r16229: Implemented Review-Points (Renamed ModelStorage to ModelCreation and and gave the enum-values better names), also added a more descriptive description.

comment:17 Changed 2 months ago by fholzing

Ticket to implement the same behaviour for RandomForest is now available (#2952).

comment:18 Changed 2 months ago by fholzing

  • Owner changed from fholzing to mkommend
  • Status changed from assigned to reviewing
Note: See TracTickets for help on using tickets.