Opened 2 years ago

Closed 2 years ago

#2450 closed defect (done)

Persistence of gradient boosted trees solutions takes a long time and creates really big files

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.13
Component: Algorithms.DataAnalysis Version: 3.3.12
Keywords: Cc:

Description

The reasons are the same as for random forests (see #1721)

Change History (15)

comment:1 Changed 2 years ago by gkronber

  • Owner set to gkronber
  • Status changed from new to accepted

comment:2 Changed 2 years ago by gkronber

r12868: introduced surrogate for GBT-models which recalculates the actual model on demand to improve persistence of GBT solutions

comment:3 Changed 2 years ago by gkronber

r12873: derived ILossFunction from IItem to allow execution on hive without privileged flag (made an "after deserialization"-hook necessary to convert the parameter type)

comment:4 Changed 2 years ago by gkronber

r12875: fixed constructors for loss functions

comment:5 Changed 2 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:6 Changed 2 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

Reviewed r12868, r12873, and r12875.

It is a good idea to save the algorithms parameterization instead of the whole model. However, I would remove usages of the GBTModel wherever possible (deprecate?). For example, GBTAlgorithmStatic.TrainGbm returns a GBTModel and no surrogate, which cannot be loaded anymore and is therefore useless.

comment:7 Changed 2 years ago by gkronber

using System;
using System.Linq;
using System.Collections.Generic;

using HeuristicLab.Core;
using HeuristicLab.Common;

using HeuristicLab.Algorithms.DataAnalysis;
using HeuristicLab.Problems.DataAnalysis;

public class MyScript : HeuristicLab.Scripting.CSharpScriptBase {
  public override void Main() {
    // type your code here
    var data = (IRegressionProblemData)vars["problemData"];
    var model = GradientBoostedTreesAlgorithmStatic.TrainGbm(data, new SquaredErrorLoss(), 100, 0.1, 1, 1, 10);
    vars["model"] = model;
  }
}

comment:8 Changed 2 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

r13065:

  • marked constructor of GBTModel obsolete
  • wrapped GBTModels in GBTModelSurrogates where necessary in the API.
  • Removed an internal unused method from the API.
  • Removed explicit usage of GBTModelSurrogate in GBTAlgorithm because gbtState.GetModel() already returns the correct type.

Changes can be tested with above script and GBT algorithm in HeuristicLab.

  1. Run the algorithm (either script or through GUI)
  2. Store and load the solution from disk
  3. Compare estimated values

comment:9 Changed 2 years ago by gkronber

r13066: adapted unit test for r13065.

comment:10 Changed 2 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

Reviewed r13065 & r13066.

GradientBoostedTreesAlgorithmStatic.TrainGbm should return a GBT solution. SurrogateGBTModel should implement IGBTModel and forward calls to the actual Model. Remove getter for actual model from SurrogateGBT model. Adapt unit test accordingly.

Last edited 2 years ago by mkommend (previous) (diff)

comment:11 Changed 2 years ago by gkronber

r13157: made the changes suggested by mkommend in the review. This is definitely a big improvement, thx!

comment:12 Changed 2 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

comment:13 Changed 2 years ago by gkronber

r13158: copy constructor fix

comment:14 Changed 2 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed r13157 and r13158.

comment:15 Changed 2 years ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed

r13184: merged r12868,r12873,r12875,r13065:13066,r13157:13158 from trunk to stable

Note: See TracTickets for help on using tickets.