Opened 3 years ago

Closed 16 months ago

#2883 closed feature request (done)

Option to store actual model instead of a surrogate-model for GBT-Solutions

Reported by: pfleck Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Algorithms.DataAnalysis Version: trunk
Keywords: Cc:

Description

To save memory, a GBT model is not stored directly, but a surrogate-GBT model is stored that holds information to recreate the actual GBT model. However, recreating the GBT model is CPU-intensive. A similar method is also used for random forests.

Currently, we have the option to disable the solution-creation to avoid creating any models. An additional option would be useful, to store the actual model instead of the surrogate-model to avoid re-calculating. This way, the user can choose a trade-off between memory consumption and calculation-effort.

Change History (33)

comment:1 Changed 3 years ago by pfleck

  • Owner set to fholzing
  • Status changed from new to assigned
  • Type changed from defect to feature request

comment:2 Changed 3 years ago by fholzing

r15668: Created branch folder

comment:3 Changed 3 years ago by fholzing

  • Version set to branch

comment:4 Changed 3 years ago by fholzing

r15669: Created branch folder

comment:5 Changed 3 years ago by fholzing

r15670: Made the solution compile

comment:6 Changed 3 years ago by fholzing

r15675: Changed from Boolean to Enum

comment:7 Changed 3 years ago by fholzing

r15678: Implemented third option for complete storage

comment:8 Changed 3 years ago by fholzing

r15679: Removed backwardscompatibility and changed the level of the decision if surrogate or not into the algorithm (so one level up)

comment:9 Changed 3 years ago by fholzing

r15687: Adapted to new trunk-structure

comment:10 Changed 3 years ago by fholzing

r15732: Changed ToolTip with new recommendation (see #2890)

comment:11 Changed 3 years ago by fholzing

  • Owner changed from fholzing to mkommend
  • Status changed from assigned to reviewing

comment:12 Changed 3 years ago by mkommend

  • Owner changed from mkommend to pfleck

comment:13 Changed 2 years ago by pfleck

r16158 merged trunk

comment:14 Changed 2 years ago by pfleck

  • Owner changed from pfleck to fholzing
  • Status changed from reviewing to assigned

Functionality

  • Everything works fine: selecting Parameters keeps the file-size small but requires time to re-run the model; selecting Complete increases the filesize drastically but re-running the alg is not required.
  • Saving and loading a GBT Solution (Complete or Parameters where model is already created), causes HL to freeze for 2-3 seconds until the file-save/load progress shows.

Code

  • Please restore the original coding style (e.g. opening braces in same line in GradientBoostedTreesAlgorithm.cs, GradientBoostedTreesModel.cs and GradientBoostedTreesModelSurrogate.cs)
  • Put ModelStorage into namespace HeuristicLab.Algorithms.DataAnalysis.
  • Otherwise, the code looks good.

Discussion

  • I think the names for the ModelStorage-enum, its values and the CreateSolution Parameter/Setting should be changed. Currently, the concept of "what is created" and the concept of "how it is stored" is mixed. The property-name indicates that we control what results are created, and the property-type indicates that we control how the model is stored. We should find a concise way to address and name those concepts clearly.
  • The functionality implemented in this ticket would also very interesting for the RandomForest. Please open a separate ticket for this.

comment:15 Changed 2 years ago by fholzing

r16220: Changed formatting to adhere to the coding guidelines

comment:16 Changed 2 years ago by fholzing

r16229: Implemented Review-Points (Renamed ModelStorage to ModelCreation and and gave the enum-values better names), also added a more descriptive description.

comment:17 Changed 2 years ago by fholzing

Ticket to implement the same behaviour for RandomForest is now available (#2952).

comment:18 Changed 2 years ago by fholzing

  • Owner changed from fholzing to mkommend
  • Status changed from assigned to reviewing

comment:19 Changed 17 months ago by mkommend

r17030: Merged all changesets into trunk.

Could not perform a merge because of the issues with our sources folder; instead I reapplied the changes.

comment:20 Changed 17 months ago by mkommend

  • Status changed from reviewing to readytorelease
  • Version changed from branch to trunk

r17031: Deleted branch for GBT Model storage.

comment:21 Changed 17 months ago by mkommend

r17032: Adapted comment in ModelCreation enum.

comment:22 Changed 17 months ago by mkommend

r17033: Adapted unit test for GBTs.

comment:23 Changed 17 months ago by mkommend

r17043: Removed outdated comment in GBTModel.

comment:24 Changed 17 months ago by mkommend

  • Summary changed from Option to store actual model instead of a surrogate-model for RegressionSolutions to Option to store actual model instead of a surrogate-model for GBT-Solutions

comment:25 Changed 17 months ago by mkommend

r17044: Initialized Lazy object in GBTModelSurrogate.

comment:26 Changed 17 months ago by abeham

  • Keywords depends-2520 added

comment:27 Changed 17 months ago by pfleck

If a GBT Solutions using a GradientBoostedTreesModelSurrogate is cloned, the re-training procedure is triggered. This happens, for example, when downloading a Hive Job that contains a GBT Solution.

Recalculation is triggered in the cloning constructor of the ModelSurrogate when accessing the ActualModel property in if (original.ActualModel != null). Using original.actualModel.IsValueCreated instead fixes the problem.

comment:28 Changed 17 months ago by mkommend

  • Owner changed from mkommend to pfleck
  • Status changed from readytorelease to assigned

Thank you Philipp for spotting this error. Could you please change the code to use the IsValueCreated property? The same bug might be present in #2952 as the implementation is closely related to this ticket.

comment:29 Changed 17 months ago by pfleck

  • Owner changed from pfleck to mkommend
  • Status changed from assigned to reviewing

r17137 Fixed triggering model recalculation when cloning.

comment:30 Changed 16 months ago by mkommend

  • Status changed from reviewing to readytorelease

Reviewed and tested r17137.

comment:31 Changed 16 months ago by abeham

  • Keywords depends-2520 removed

comment:32 Changed 16 months ago by gkronber

  • Keywords merged added

r17156: merged r17030, r17032, r17033, r17043, r17044, r17137 from trunk to stable

comment:33 Changed 16 months ago by jkarder

  • Keywords merged removed
  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.