Opened 6 months ago

Last modified 3 days ago

#2699 reviewing feature request

Radial Basis Function Regression

Reported by: bwerth Owned by: bwerth
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Algorithms.DataAnalysis Version: branch
Keywords: Cc:

Description

Radial Basis Functions are yet another type of regression model often used in surrogate modelling. A main benefit is their extensibility to non-continuous domains.

A Radial Basis Function Regression alongside suitable Kernel functions and Distance metrics shall be added to the existing DataAnalysis solution

Attachments (1)

Radial Basis Function Regression (RBF-R).hl (4.8 MB) - added by gkronber 4 months ago.
Run with strange LOO and variance (Line chart 95% conf)

Change History (18)

comment:1 Changed 6 months ago by bwerth

  • Owner set to bwerth
  • Status changed from new to assigned

comment:2 Changed 6 months ago by bwerth

  • Status changed from assigned to accepted
  • Version changed from 3.3.14 to branch

comment:3 Changed 6 months ago by bwerth

r14385 created branch for radial basis functions regression

comment:4 Changed 6 months ago by bwerth

r14386 moved RadialBasisFunctions from Problems.SurrogateProblem to Algorithms.DataAnalysis

comment:5 Changed 4 months ago by gkronber

r14500: fixed a small typo and added .sln file while reviewing

comment:6 Changed 4 months ago by gkronber

I tested with the 'Chemical-I' problem instance and Euclidean norm with beta=2 (see attachment). The predictions seem to be nice on training and test set but the estimated variances and the approximated error for LOO CV seems to be way off. Could you please check whether the calculation of both are correct?

comment:7 Changed 4 months ago by gkronber

  • Status changed from accepted to assigned

Changed 4 months ago by gkronber

Run with strange LOO and variance (Line chart 95% conf)

comment:8 Changed 2 weeks ago by gkronber

r14869: merged changesets from trunk to branch

comment:9 Changed 2 weeks ago by gkronber

r14870: merged changesets from trunk to branch

comment:10 Changed 2 weeks ago by gkronber

Review comments:

  • LINQ is used a lot in combination with matrix operations. This is often slow because of memory allocations required for enumerators. (DONE)
  • There should be an option to scale the input variables (scaling should be active by default) (DONE)
  • RBF regression does not support noise. If there are duplicate x vectors, model building fails. An option would be to add a diagonal matrix to the gram matrix (leading to kernel ridge regression?) (DONE)
  • I have not found a source for the calculation of variance and LOO error (DONE, removed LOO calculation)
  • Don't know how to best unify covariance functions and kernel functions (there is some duplication) (DONE).
  • The calculation of the covariance matrix takes a lot of time (10x longer than the equivalent calculation when using an equivalent covariance matrix). I suspect that the reason is the rather general implementation for distance calculation. (DONE)
  • Beta should be a parameter of the algorithm instead of the kernel to make it easier to run a grid test. (DONE)
  • Multiple of the implemented kernels are only conditionally positive definite. See http://num.math.uni-goettingen.de/schaback/teaching/sc.pdf for a definition of the kernels and valid beta-values. Additionally, it is necessary to extend the basis functions for these kernels depending on the value of beta.

Made a number of changes in r14872 mainly refactoring RBF model.

Last edited 5 days ago by gkronber (previous) (diff)

comment:11 Changed 5 days ago by bwerth

r14883 checked and reformulated gradient functions for kernels

comment:12 Changed 5 days ago by gkronber

r14884: renamed folder. RBF regression can be seen as a special case of kernelized ridge regression.

comment:13 Changed 5 days ago by gkronber

r14885: renamed and moved files

comment:14 Changed 5 days ago by gkronber

r14887: worked on kernel ridge regression.

  • moved beta parameter to algorithm.
  • reintroduced IKernel interface to restrict choice of kernel in kernel ridge regression.
  • speed-up by cholesky decomposition and
  • optimization of the calculation of the covariance matrix.

comment:15 Changed 4 days ago by gkronber

  • Status changed from assigned to reviewing

r14888: re-added calculation of leave one out cv estimate

comment:16 Changed 3 days ago by bwerth

r14891 reworked kernel functions (beta is always a scaling factor now) added LU-Decomposition as a fall-back if Cholesky-decomposition fails

Last edited 3 days ago by bwerth (previous) (diff)

comment:17 Changed 3 days ago by gkronber

r14892: made some adjustments after bwerth's changes (mainly formatting)

Note: See TracTickets for help on using tickets.