Opened 3 years ago

Last modified 3 months ago

#2898 reviewing enhancement

Generalized additive models (GAM)

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Algorithms.DataAnalysis Version: trunk
Keywords: Cc:

Description

Generalized additive models would be a great addition to the set of data-based modeling algorithms.

Feature wishlist:

  • Base-learner for the terms is configurable (default: smoothing spline or penalized regression spline). E.g. it would be great if we could use an efficient symbolic regression solver as base learner.
  • Individually adjustable smoothing or regularization parameter for each term.
  • Automatic selection of smoothing or regularization parameter for each term ideally based on generalized cross-validation (GCV).
  • The variables allowed in each term are configureable.

Idea for a first prototype:

  • Only uni-variate terms are allowed
  • Use alglib penalized regression spline for each term
  • The variables together with penalization parameters for each term are read from a list (algorithm parameter)

Change History (16)

comment:1 Changed 3 years ago by gkronber

  • Owner set to gkronber
  • Status changed from new to accepted

comment:2 Changed 3 years ago by gkronber

r15774: created branch

comment:3 Changed 3 years ago by gkronber

  • Owner changed from gkronber to lkammere
  • Status changed from accepted to reviewing

r15775: added simple implementation of GAM based on uni-variate penalized regression splines with the same penalization factor for each term

comment:4 Changed 3 years ago by gkronber

  • Milestone changed from HeuristicLab 4.x Backlog to HeuristicLab 3.3.16
  • Owner changed from lkammere to gkronber
  • Status changed from reviewing to assigned

comment:5 Changed 2 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.x Backlog

comment:6 Changed 6 months ago by gkronber

r17812: copied implementation from branch to trunk.

comment:7 Changed 6 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.17
  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing
  • Version changed from branch to trunk

comment:8 Changed 6 months ago by gkronber

r17813: delete branch

comment:9 Changed 6 months ago by gkronber

r17815: fix header

comment:10 Changed 4 months ago by mkommend

r17839: Fixed base ctor call in Spline1dModel.

comment:11 Changed 4 months ago by mkommend

Review comments

  • What's the point of the ToArray call within GetEstimatedValues (Spline1dModel line 74)? (-> this was an artifact from older code, fixed in r17867)
Last edited 3 months ago by mkommend (previous) (diff)

comment:12 Changed 4 months ago by gkronber

r17867: simplified code in Spline1dModel

comment:13 Changed 3 months ago by mkommend

r17888: Corrected calculation of MSE and RMSE in GAMs by implementing helper methods for their calculation.

The previous implementation used the stddev or variance of the residuals. However, stddev(res) == RMSE and var(res) == MSE only holds iff mean(res) == 0.0. In practice this is not the case due to calculation differences for the training and almost never for the test data (e.g. data shifts), hence the RMSE and MSE have to be calculated the traditional way.

@gkronber I am unsure about the value in the RSS table (line 201). Previously it contained the var(res) and so I changed it to the MSE. However, the name RSS suggest residual sum of squares, thus it should contain MSE * n. Please, comment, correct, rename, on this. Maybe I miss something obvious.

comment:14 Changed 3 months ago by mkommend

r17889: Minor changes in Spline1dModel (added field for inputVariable, and named model and solution more appropriately).

comment:15 Changed 3 months ago by mkommend

  • Owner changed from mkommend to gkronber

comment:16 Changed 3 months ago by gkronber

Reviewed r17888:17889.

Note: See TracTickets for help on using tickets.