Version 17 (modified by gkronber, 14 years ago) (diff) |
---|
Additional Material for Publications
This page contains a collection of additional material related to publications of members of the research group HEAL.
2010
Dissertation Kronberger
The following datasets are used in experiments in the thesis.
Artificial benchmark datasets
Friedman-I
This dataset is described in (Friedman 1991), where it is used to benchmark the multi-variate adaptive regression splines (MARS) algorithm. The signal-to-noise ratio in this dataset is rather low, so it is difficult to rediscover the generating function f(x) especially the terms below the noise level (x4 and x5).
Variables x01,..., x10 are sampled uniformly from the unit hypercube (x~U(0,1)). Epsilon is generated from the standard normal distribution (e~N(0,1)).
Friedman-II
Breiman-I
This dataset is described in (Breiman et al. 1984), where it is used to benchmark the classification and regression trees (CART) algorithm. The signal-to-noise ratio is rather low and additionally it contains a crisp conditional which makes it rather difficult to rediscover the generating function with a symbolic regression approach.
Epsilon is generated from the normal distribution (e~N(0,2)).
Variables x01,..., x10 are randomly sampled attributes following the probability distributions:
Real-world datasets
Chemical-I
Chemical-II
Financial-I
Macro-Economic
Housing
References
Jerome H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics, 19(1):1-141, 1991. Leo Breiman, Jerome H. Friedman, Charles J. Stone and R. A. Olson, Classification and Regression Trees, Chapman and Hall, 1984
2011
Attachments (4)
- friedman-I.png (5.2 KB) - added by gkronber 14 years ago.
- breiman-I.png (7.0 KB) - added by gkronber 14 years ago.
- breiman-I-variables.png (6.9 KB) - added by gkronber 14 years ago.
- friedman-ii.png (8.1 KB) - added by gkronber 14 years ago.
Download all attachments as: .zip