HeuristicLab - Blog
http://dev.heuristiclab.com/trac.fcgi/blog
HeuristicLab Blogen-USTrac 1.0.12HeuristicLabhttp://dev.heuristiclab.com/trac.fcgi/chrome/site/HeuristicLabBanner.png
http://dev.heuristiclab.com/trac.fcgi/blog
Gaussian Processes for Regression and ClassificationgkronberSun, 06 Oct 2013 14:40:00 GMT
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/gaussian_processes
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/gaussian_processes<p>
With the latest HeuristicLab version 3.3.8 we released an implementation of Gaussian process models for regression analysis. Our purely managed C# implementation is mainly based on the MATLAB <a class="ext-link" href="http://www.gaussianprocess.org/gpml/code"><span class="icon"></span>implementation</a> by Rasmussen and Nickisch accompanying the book "Gaussian Processes for Machine Learning" by Rasmussen and Williams <a class="ext-link" href="http://www.gaussianprocess.org/gpml/"><span class="icon"></span>(available online)</a>.
</p>
<p>
If you want to try Gaussian process regression in HeuristicLab, simply open the preconfigured sample. You can also import a CSV file with your own data.
</p>
<p>
The Gaussian process model can be viewed as a Bayesian prior distribution over functions and is related to Bayesian linear regression.
</p>
<p>
Samples from two different one-dimensional Gaussian processes:
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/gaussian_processes/GP%20samples%20I.png"><img width="450px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/gaussian_processes/GP%20samples%20I.png" /></a>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/gaussian_processes/GP%20samples%20II.png"><img width="450px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/gaussian_processes/GP%20samples%20II.png" /></a>
</p>
<p>
Similarily to other models, such as the SVM, the GP model also uses the 'kernel-trick' to handle high-dimensional non-linear projections to feature space efficiently.
</p>
<p>
'Fitting' the model means to calculate the posterior Gaussian process distribution by conditioning the GP prior distribution on the observed data points in the training set. This leads to a posterior distribution in which functions that go through the observed training points are more likely. From the posterior GP distribution it is easily possible to calculate the posterior predictive distribution. So, instead of a simple point prediction for each test point it is possible to use the mean of the predictve distribution and calculate confidence intervals for the prediction at each test point.
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/gaussian_processes/GP%20learning.png"><img width="450px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/gaussian_processes/GP%20learning.png" /></a>
</p>
<p>
The model is non-parametric and is fully specified via a mean function and a covariance function. The mean and covariance function often have hyper-parameters that have to be optimized to fit the model to a given training data set. For more information check out the <a class="ext-link" href="http://www.gaussianprocess.org/gpml/|GPML"><span class="icon"></span>book</a>.
</p>
<p>
In HeuristicLab hyper-parameters of the mean and covariance functions are optimized w.r.t. the likelihood function (type-II ML) using the gradient-based BFGS algorithm. In the GUI you can observe the development of the likelihood and the values of the hyper-parameters over BFGS iterations. The output of the final Gaussian process model can also be visualized using a line chart that shows the mean prediction and the 95% confidence intervals.
</p>
<p>
Line chart of the negative log-likelihood:
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/gaussian_processes/GP%20likelihood.png"><img width="450px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/gaussian_processes/GP%20likelihood.png" /></a>
</p>
<p>
Line chart of the optimized hyper-parameters:
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/gaussian_processes/GP%20hyperparams.png"><img width="450px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/gaussian_processes/GP%20hyperparams.png" /></a>
</p>
<p>
Output of the model (mean and confidence interval):
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/gaussian_processes/GP%20model.png"><img width="450px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/gaussian_processes/GP%20model.png" /></a>
</p>
<p>
We observed Gaussian process models often produce very accurate predictions, especially for low-dimensional data sets with up to 5000 training points. For larger data sets the computational effort becomes prohibitive (we have not yet implemented sparse approximations).
</p>
DataAnalysisFeaturesMath notation for symbolic modelsgkronberSun, 05 Feb 2012 16:22:16 GMT
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/formularendering
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/formularendering<p>
In February I have a little more time available that I can spend on <tt>HeuristicLab</tt> development. So I implemented a new view that shows genetic programming solutions for symbolic data analysis problems in conventional math notation. This has been on our wishlist for a long time, however, up to now we didn't see a good way of implementing this. The implementation is not ideal because it relies on the <a class="ext-link" href="http://www.mathjax.org"><span class="icon"></span>MathJax</a> library (Javascript) to display the models in a webbrowser control. Using the daily build of the trunk version you can try this new feature. I hope you find it useful.
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/formularendering/mathnotation1.png"><img width="600px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/formularendering/mathnotation1.png" /></a>
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/formularendering/mathnotation2.png"><img width="600px" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/formularendering/mathnotation2.png" /></a>
</p>
SystemIdentificationDataAnalysisUIViewFinancial Analysis with HeuristicLabgkronberFri, 02 Sep 2011 15:23:29 GMT
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/financial_analysis
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/financial_analysis<p>
One application that I've been interested in lately is financial analysis.
</p>
<p>
Recently I've looked at interest rate swaps in more detail. Interest rate swaps are an important financial instrument for controlling risk, but are also used for speculative purposes. Using the genetic programming capabilities of HeuristicLab it is relatively easy to generate a regression model to estimate the interest rate swap yield. The result for the European 10-year interest rate swap (monthly data) in the time span from April 1991 until August 2011 can be seen in the next figure.
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/financial_analysis/EZ%20IRS%20yield%2010y.png"><img src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/financial_analysis/EZ%20IRS%20yield%2010y.png" /></a>
</p>
<p>
In the last section from index 582 onwards (July 2007 - August 2011) the output of the model (red line) deviates very strongly from the actually observed values.
</p>
<p>
To get a better idea of the underlying relations found by GP it is interesting to study variable impacts. The most important variables for the 10-year European interest rate swap yield found through the genetic programming runs are:
</p>
<table class="wiki">
<tr><th> Most relevant variables </th><th> Hold out set
</th></tr><tr><td> US M1, US Mortgage Market Index </td><td> March 1991 - April 1995
</td></tr><tr><td> Eurozone Employment qq, US M1, US Corporate Profits </td><td> April 1995 - May 1999
</td></tr><tr><td> US Corporate Profits, Eurozone Employment qq, US U Michigan Expectations Prelim. </td><td> May 1999 - June 2003
</td></tr><tr><td> US Corporate Profits, Eurozone Employment qq </td><td> June 2003 - July 2007
</td></tr><tr><td> US Existing Home Sales </td><td> July 2007 - August 2008
</td></tr></table>
<p>
Interestingly the most important variables differ for different time spans. Only the corporate profits and the number of employed persons in the Euro zone are detected as relevant over a larger time span.
</p>
<p>
The following table shows the variable impact calculation results for the first fold in greater detail. It can be clearly seen that money supply M1 in the US and the US mortgage market index are used repeatedly in all models. This is a strong indicator that there is a strong correlation of these variables and the interest rate swap yield for the time span from April 1995 until August 2011 which was used as training set for these models. As can be seen in the previous chart the output on the hold out set (March 1991 - April 1995) is relatively accurate.
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/blog/gkronber/financial_analysis/EZ%20IRS%20yield%20variable%20impacts.png"><img src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/blog/gkronber/financial_analysis/EZ%20IRS%20yield%20variable%20impacts.png" /></a>
</p>
ApplicationsDataAnalysisNew Feature for System Identification: Model Response ViewgkronberWed, 24 Aug 2011 10:18:25 GMT
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/model_response_view
http://dev.heuristiclab.com/trac.fcgi/blog/gkronber/model_response_view<p>
Last week I implemented a new view for symbolic regression models that makes it possible to analyse the impact of a given input variable on the output of the model in more detail. I'm already looking forward to apply it to real world scenarios.
</p>
<p>
<a style="padding:0; border:none" href="http://dev.heuristiclab.com/trac.fcgi/attachment/ticket/1621/responsefunction.png"><img width="600px" alt="Screenshot of initial implementation idea" title="Screenshot of initial implementation idea" src="http://dev.heuristiclab.com/trac.fcgi/raw-attachment/ticket/1621/responsefunction.png" /></a>
</p>
<p>
The development efforts for this feature are tracked in ticket <a class="closed ticket" href="http://dev.heuristiclab.com/trac.fcgi/ticket/1621" title="feature request: View to analyze the response behavior of a regression model (closed: done)">#1621</a>
</p>
SystemIdentificationDataAnalysis