Opened 9 years ago
Closed 9 years ago
#2450 closed defect (done)
Persistence of gradient boosted trees solutions takes a long time and creates really big files
Reported by: | gkronber | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.13 |
Component: | Algorithms.DataAnalysis | Version: | 3.3.12 |
Keywords: | Cc: |
Description
The reasons are the same as for random forests (see #1721)
Change History (15)
comment:1 Changed 9 years ago by gkronber
- Owner set to gkronber
- Status changed from new to accepted
comment:2 Changed 9 years ago by gkronber
comment:3 Changed 9 years ago by gkronber
r12873: derived ILossFunction from IItem to allow execution on hive without privileged flag (made an "after deserialization"-hook necessary to convert the parameter type)
comment:4 Changed 9 years ago by gkronber
r12875: fixed constructors for loss functions
comment:5 Changed 9 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from accepted to reviewing
comment:6 Changed 9 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from reviewing to assigned
Reviewed r12868, r12873, and r12875.
It is a good idea to save the algorithms parameterization instead of the whole model. However, I would remove usages of the GBTModel wherever possible (deprecate?). For example, GBTAlgorithmStatic.TrainGbm returns a GBTModel and no surrogate, which cannot be loaded anymore and is therefore useless.
comment:7 Changed 9 years ago by gkronber
using System; using System.Linq; using System.Collections.Generic; using HeuristicLab.Core; using HeuristicLab.Common; using HeuristicLab.Algorithms.DataAnalysis; using HeuristicLab.Problems.DataAnalysis; public class MyScript : HeuristicLab.Scripting.CSharpScriptBase { public override void Main() { // type your code here var data = (IRegressionProblemData)vars["problemData"]; var model = GradientBoostedTreesAlgorithmStatic.TrainGbm(data, new SquaredErrorLoss(), 100, 0.1, 1, 1, 10); vars["model"] = model; } }
comment:8 Changed 9 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from assigned to reviewing
- marked constructor of GBTModel obsolete
- wrapped GBTModels in GBTModelSurrogates where necessary in the API.
- Removed an internal unused method from the API.
- Removed explicit usage of GBTModelSurrogate in GBTAlgorithm because gbtState.GetModel() already returns the correct type.
Changes can be tested with above script and GBT algorithm in HeuristicLab.
- Run the algorithm (either script or through GUI)
- Store and load the solution from disk
- Compare estimated values
comment:9 Changed 9 years ago by gkronber
comment:10 Changed 9 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from reviewing to assigned
GradientBoostedTreesAlgorithmStatic.TrainGbm should return a GBT solution. SurrogateGBTModel should implement IGBTModel and forward calls to the actual Model. Remove getter for actual model from SurrogateGBT model. Adapt unit test accordingly.
comment:11 Changed 9 years ago by gkronber
r13157: made the changes suggested by mkommend in the review. This is definitely a big improvement, thx!
comment:12 Changed 9 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from assigned to reviewing
comment:13 Changed 9 years ago by gkronber
r13158: copy constructor fix
comment:14 Changed 9 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from reviewing to readytorelease
comment:15 Changed 9 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
r13184: merged r12868,r12873,r12875,r13065:13066,r13157:13158 from trunk to stable
r12868: introduced surrogate for GBT-models which recalculates the actual model on demand to improve persistence of GBT solutions