Changeset 15465


Ignore:
Timestamp:
11/08/17 20:23:37 (3 years ago)
Author:
gkronber
Message:

Started a summary of findings.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • branches/MathNetNumerics-Exploration-2789/HeuristicLab.Algorithms.DataAnalysis.Experimental/Splines.cs

    r15457 r15465  
    3636using MathNet.Numerics.LinearAlgebra.Double;
    3737using MathNet.Numerics.LinearAlgebra.Double.Solvers;
     38
     39
     40/*
     41 * Experimenting with various spline implementations.
     42 *
     43 * There are many different variants of using splines.
     44 *  - Spline Interpolation
     45 *  - Natural Spline Interpolation
     46 *  - Smoothing Splines (Reinsch)
     47 *  - P-Splines
     48 *  - B-Splines
     49 *  - Regression Splines
     50 *  - Penalized Regression Splines
     51 *  -
     52 *
     53 *
     54 *  Generally, a spline is a piecewise function whereby each piece connects two knots.
     55 *  In almost all cases splines are univariate. There are extensions to more dimensions
     56 *  but this is much less convenient.
     57 *  A linear spline is a piecewise linear function. Usually the following constaints are enforced
     58 *  for each knot point k and the two piecewise functions fl and fr
     59 *  fl(k) = fr(k), fl(k)' = fr(k)', fl(k)''=0, fr(k)''=0, additionally constraints for
     60 *  the boundary knots are often enforced.
     61
     62 *  Spline Interpolation solves the problem of finding a smooth interpolating function
     63 *  which goes through all knots exactly.
     64 *
     65 *  In spline smoothing one tries to find a function minimizing the squared residuals
     66 *  and maximizing smoothness. For the smoothing spline g the optimization objective is
     67 *  || y - g(x) ||² + \lambda integrate (xmin : xmax) g(x)'' dx.
     68 *  The roughness penality is the integral over the second derivative of the smoothing spline
     69 *  over the whole domain of x. Lambda is used to balance the amount of smoothing.
     70 *  If lambda is set to zero, the solution is an interpolating spline. If lambda is set to
     71 *  infinity the solution is a maximally smooth function (= line).
     72 *  Interestingly, the estimation of a smoothing spline is a linear operation. As a consequence
     73 *  a lot of theory from linear models can be reused (e.g. cross-validation and generalized
     74 *  cross-validation) with minimal additional effort. GCV can also be used to tune the
     75 *  smoothing parameter automatically. Tuned algorithms are available to solve smoothing
     76 *  spline estimation efficiently O(n) or O(m²n) where m is the number of knots and n the
     77 *  number of data points.
     78 *  It is possible to calculate confidence intervals for the smoothing spline and given
     79 *  an error model subsequently also confidence intervals for predictions.
     80 *
     81 *  We are primarily interested in spline smoothing / regression.
     82 *  Alglib and Math.NET provide several different implementations for interpolation
     83 *  including polynomial (cubic), Hermite, Akima, Linear, Catmull-Rom, Rational, ...
     84 *
     85 *  For spline smoothing or regression there are also many differen variants.
     86 *
     87 *  Smoothing Splines by Reinsch (Reinsch, Smoothing by Spline Functions, 1967).
     88 *  Pseudo-code for the algorithm is given in the paper. A small change in the algorithm
     89 *  to improve convergence is discussed in (Reinsch, Smoothing by Spline Functions II, 1970)
     90 *  The algorithm uses the properties of the smoothing matrix (bandedness) to calculate the
     91 *  smoothing spline in O(n). Two parameters have to be set (S and rho). If the noise in the
     92 *  data is known then both parameter can be set directly, otherwise S should be set to n-1
     93 *  and rho should be found via cross-validation.
     94 *  It is not easy to calculate a the CV (PRESS) or GCV in the approach by Reinsch and
     95 *  therefore the algorithm is not ideal.
     96 *  Wahba and colleagues (1990) introduced CV and GCV estimation.
     97 *  Hutchinson and colleages improved the algorithm (Fortran implementation available as
     98 *  algorithm 642.f from ACM).
     99 *
     100 *  CUBGCV (Algorithm 642 Hutchingson)
     101 *  Several improvments over the Reinsch algorithm. In particular efficient calculation
     102 *  of the
     103 
     104 *  Finbarr O'Sullivan BART
     105 *  ...
     106 *
     107 *  B-Splines
     108 *  ...
     109 *
     110 *  P-Splines (Eilers and Marx), use the B-Spline basis in combination with a penality
     111 *  for large variations of subsequent B-Spline coefficients. Conceptually, if the
     112 *  coefficients of neighbouring B-Splines are almost the same then the resulting smoothing
     113 *  spline is less wiggly. It has been critizized that this is rather subjective.
     114 *  P-Splines also allow calculation of CV and GCV for a given smoothing parameter which
     115 *  can be used for optimization of lambda.
     116 *  It is easy to use P-Splines within GAMs (no backfitting necessary).
     117 *  The paper contains MATLAB code also for the calculation of the B-Spline basis.
     118 *  The computational effort is O(?). The authors argue that it is efficient.
     119 *  The number of knots is not critical it is possible to use every data point as knot.
     120 *
     121 *  Alglib: penalized regression spline
     122 *  ...
     123 *
     124 *  Alglib: penalized regression spline with support for weights
     125 *  ...
     126 */
    38127
    39128namespace HeuristicLab.Algorithms.DataAnalysis.Experimental {
Note: See TracChangeset for help on using the changeset viewer.