Opened 2 years ago

Closed 20 months ago

#2450 closed defect (done)

Persistence of gradient boosted trees solutions takes a long time and creates really big files

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.13
Component: Algorithms.DataAnalysis Version: 3.3.12
Keywords: Cc:

Description

The reasons are the same as for random forests (see #1721)

Change History (15)

comment:1 Changed 23 months ago by gkronber

  • Owner set to gkronber
  • Status changed from new to accepted

comment:2 Changed 23 months ago by gkronber

r12868: introduced surrogate for GBT-models which recalculates the actual model on demand to improve persistence of GBT solutions

comment:3 Changed 23 months ago by gkronber

r12873: derived ILossFunction from IItem to allow execution on hive without privileged flag (made an "after deserialization"-hook necessary to convert the parameter type)

comment:4 Changed 23 months ago by gkronber

r12875: fixed constructors for loss functions

comment:5 Changed 21 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:6 Changed 21 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

Reviewed r12868, r12873, and r12875.

It is a good idea to save the algorithms parameterization instead of the whole model. However, I would remove usages of the GBTModel wherever possible (deprecate?). For example, GBTAlgorithmStatic.TrainGbm returns a GBTModel and no surrogate, which cannot be loaded anymore and is therefore useless.

comment:7 Changed 20 months ago by gkronber

using System;
using System.Linq;
using System.Collections.Generic;

using HeuristicLab.Core;
using HeuristicLab.Common;

using HeuristicLab.Algorithms.DataAnalysis;
using HeuristicLab.Problems.DataAnalysis;

public class MyScript : HeuristicLab.Scripting.CSharpScriptBase {
  public override void Main() {
    // type your code here
    var data = (IRegressionProblemData)vars["problemData"];
    var model = GradientBoostedTreesAlgorithmStatic.TrainGbm(data, new SquaredErrorLoss(), 100, 0.1, 1, 1, 10);
    vars["model"] = model;
  }
}

comment:8 Changed 20 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

r13065:

  • marked constructor of GBTModel obsolete
  • wrapped GBTModels in GBTModelSurrogates where necessary in the API.
  • Removed an internal unused method from the API.
  • Removed explicit usage of GBTModelSurrogate in GBTAlgorithm because gbtState.GetModel() already returns the correct type.

Changes can be tested with above script and GBT algorithm in HeuristicLab.

  1. Run the algorithm (either script or through GUI)
  2. Store and load the solution from disk
  3. Compare estimated values

comment:9 Changed 20 months ago by gkronber

r13066: adapted unit test for r13065.

comment:10 Changed 20 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

Reviewed r13065 & r13066.

GradientBoostedTreesAlgorithmStatic.TrainGbm should return a GBT solution. SurrogateGBTModel should implement IGBTModel and forward calls to the actual Model. Remove getter for actual model from SurrogateGBT model. Adapt unit test accordingly.

Last edited 20 months ago by mkommend (previous) (diff)

comment:11 Changed 20 months ago by gkronber

r13157: made the changes suggested by mkommend in the review. This is definitely a big improvement, thx!

comment:12 Changed 20 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

comment:13 Changed 20 months ago by gkronber

r13158: copy constructor fix

comment:14 Changed 20 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed r13157 and r13158.

comment:15 Changed 20 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed

r13184: merged r12868,r12873,r12875,r13065:13066,r13157:13158 from trunk to stable

Note: See TracTickets for help on using tickets.