Opened 7 years ago
Closed 5 years ago
#2883 closed feature request (done)
Option to store actual model instead of a surrogate-model for GBT-Solutions
Reported by: | pfleck | Owned by: | mkommend |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.16 |
Component: | Algorithms.DataAnalysis | Version: | trunk |
Keywords: | Cc: |
Description
To save memory, a GBT model is not stored directly, but a surrogate-GBT model is stored that holds information to recreate the actual GBT model. However, recreating the GBT model is CPU-intensive. A similar method is also used for random forests.
Currently, we have the option to disable the solution-creation to avoid creating any models. An additional option would be useful, to store the actual model instead of the surrogate-model to avoid re-calculating. This way, the user can choose a trade-off between memory consumption and calculation-effort.
Change History (33)
comment:1 Changed 7 years ago by pfleck
- Owner set to fholzing
- Status changed from new to assigned
- Type changed from defect to feature request
comment:2 Changed 7 years ago by fholzing
comment:3 Changed 7 years ago by fholzing
- Version set to branch
comment:4 Changed 7 years ago by fholzing
r15669: Created branch folder
comment:5 Changed 7 years ago by fholzing
r15670: Made the solution compile
comment:6 Changed 7 years ago by fholzing
r15675: Changed from Boolean to Enum
comment:7 Changed 7 years ago by fholzing
r15678: Implemented third option for complete storage
comment:8 Changed 7 years ago by fholzing
r15679: Removed backwardscompatibility and changed the level of the decision if surrogate or not into the algorithm (so one level up)
comment:9 Changed 7 years ago by fholzing
r15687: Adapted to new trunk-structure
comment:10 Changed 7 years ago by fholzing
comment:11 Changed 7 years ago by fholzing
- Owner changed from fholzing to mkommend
- Status changed from assigned to reviewing
comment:12 Changed 7 years ago by mkommend
- Owner changed from mkommend to pfleck
comment:13 Changed 6 years ago by pfleck
r16158 merged trunk
comment:14 Changed 6 years ago by pfleck
- Owner changed from pfleck to fholzing
- Status changed from reviewing to assigned
Functionality
- Everything works fine: selecting Parameters keeps the file-size small but requires time to re-run the model; selecting Complete increases the filesize drastically but re-running the alg is not required.
- Saving and loading a GBT Solution (Complete or Parameters where model is already created), causes HL to freeze for 2-3 seconds until the file-save/load progress shows.
Code
- Please restore the original coding style (e.g. opening braces in same line in GradientBoostedTreesAlgorithm.cs, GradientBoostedTreesModel.cs and GradientBoostedTreesModelSurrogate.cs)
- Put ModelStorage into namespace HeuristicLab.Algorithms.DataAnalysis.
- Otherwise, the code looks good.
Discussion
- I think the names for the ModelStorage-enum, its values and the CreateSolution Parameter/Setting should be changed. Currently, the concept of "what is created" and the concept of "how it is stored" is mixed. The property-name indicates that we control what results are created, and the property-type indicates that we control how the model is stored. We should find a concise way to address and name those concepts clearly.
- The functionality implemented in this ticket would also very interesting for the RandomForest. Please open a separate ticket for this.
comment:15 Changed 6 years ago by fholzing
r16220: Changed formatting to adhere to the coding guidelines
comment:16 Changed 6 years ago by fholzing
r16229: Implemented Review-Points (Renamed ModelStorage to ModelCreation and and gave the enum-values better names), also added a more descriptive description.
comment:17 Changed 6 years ago by fholzing
Ticket to implement the same behaviour for RandomForest is now available (#2952).
comment:18 Changed 6 years ago by fholzing
- Owner changed from fholzing to mkommend
- Status changed from assigned to reviewing
comment:19 Changed 5 years ago by mkommend
r17030: Merged all changesets into trunk.
Could not perform a merge because of the issues with our sources folder; instead I reapplied the changes.
comment:20 Changed 5 years ago by mkommend
- Status changed from reviewing to readytorelease
- Version changed from branch to trunk
r17031: Deleted branch for GBT Model storage.
comment:21 Changed 5 years ago by mkommend
r17032: Adapted comment in ModelCreation enum.
comment:22 Changed 5 years ago by mkommend
r17033: Adapted unit test for GBTs.
comment:23 Changed 5 years ago by mkommend
r17043: Removed outdated comment in GBTModel.
comment:24 Changed 5 years ago by mkommend
- Summary changed from Option to store actual model instead of a surrogate-model for RegressionSolutions to Option to store actual model instead of a surrogate-model for GBT-Solutions
comment:25 Changed 5 years ago by mkommend
r17044: Initialized Lazy object in GBTModelSurrogate.
comment:26 Changed 5 years ago by abeham
- Keywords depends-2520 added
comment:27 Changed 5 years ago by pfleck
If a GBT Solutions using a GradientBoostedTreesModelSurrogate is cloned, the re-training procedure is triggered. This happens, for example, when downloading a Hive Job that contains a GBT Solution.
Recalculation is triggered in the cloning constructor of the ModelSurrogate when accessing the ActualModel property in if (original.ActualModel != null). Using original.actualModel.IsValueCreated instead fixes the problem.
comment:28 Changed 5 years ago by mkommend
- Owner changed from mkommend to pfleck
- Status changed from readytorelease to assigned
Thank you Philipp for spotting this error. Could you please change the code to use the IsValueCreated property? The same bug might be present in #2952 as the implementation is closely related to this ticket.
comment:29 Changed 5 years ago by pfleck
- Owner changed from pfleck to mkommend
- Status changed from assigned to reviewing
r17137 Fixed triggering model recalculation when cloning.
comment:30 Changed 5 years ago by mkommend
- Status changed from reviewing to readytorelease
Reviewed and tested r17137.
comment:31 Changed 5 years ago by abeham
- Keywords depends-2520 removed
comment:32 Changed 5 years ago by gkronber
- Keywords merged added
comment:33 Changed 5 years ago by jkarder
- Keywords merged removed
- Resolution set to done
- Status changed from readytorelease to closed
r15668: Created branch folder