Free cookie consent management tool by TermsFeed Policy Generator

Opened 11 years ago

Closed 11 years ago

#1979 closed defect (done)

Some Regression Problem Instances are not correct

Reported by: sforsten Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.8
Component: Problems.Instances Version: 3.3.8
Keywords: Cc:

Description

Some of the regression problem instances don't work at all (Korn 7, 9, 15), while others seem to have miss some points in the training or test partition and Keijzer 1-3 have the same name.

Change History (20)

comment:1 Changed 11 years ago by sforsten

  • Status changed from new to accepted

r8900:

  • renamed Keijzer 1-3
  • corrected training and test partition of Keijzer 9 and 10
  • changed interval for some variables in Korn 7, 9, 15 to avoid infinity and NaN values

comment:2 Changed 11 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from accepted to reviewing

comment:3 Changed 11 years ago by mkommend

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.8

comment:4 Changed 11 years ago by mkommend

  • Owner changed from mkommend to sforsten
  • Status changed from reviewing to readytorelease

comment:5 Changed 11 years ago by mkommend

  • Status changed from readytorelease to assigned

Please add the following problem to the real world problem instances http://symbolicregression.com/?q=emulatorProblem.

comment:6 Changed 11 years ago by gkronber

Discussion is needed. Please do not yet add the emulatorProblem...

comment:7 Changed 11 years ago by gkronber

  • Owner changed from sforsten to gkronber
  • Status changed from assigned to accepted

comment:8 Changed 11 years ago by gkronber

r8999: adapted Vladislavleva regression problem instances to match the definition of the "GP Benchmarks" paper

comment:9 follow-up: Changed 11 years ago by mkommend

I had a quick look over r8999 and I would bet that the range of the Kotanchek and SineCosine problem are not correct, because I could not imagine that a test partition of ~ 100,000 samples was intended.

comment:10 in reply to: ↑ 9 Changed 11 years ago by gkronber

Replying to mkommend:

I had a quick look over r8999 and I would bet that the range of the Kotanchek and SineCosine problem are not correct, because I could not imagine that a test partition of ~ 100,000 samples was intended.

Well it is defined in the GP benchmarks paper in this way and implemented in ECJ like that. So it seems this is intentional. Additionally, the large test partition should not be a problem as it is only needed once for test performance evaluation.

The only other problem with a large test set is Keijzer-15 (441*441 samples). All other instances have smaller test sets.

Last edited 11 years ago by gkronber (previous) (diff)

comment:11 Changed 11 years ago by gkronber

Review comments about r8900: Korns 7, 9, and 15 do work but the range of the target variable is very large due to exp(50) occurring in the target expression. Since this is how the problem is defined we should keep it that way for now. I'm awaiting further input from the gp-benchmarks mailing list.

comment:12 Changed 11 years ago by gkronber

r9007 cross-checked all regression problem instances with the GP benchmarks paper and adapted where I thought necessary.

comment:13 Changed 11 years ago by gkronber

r9008: fixed ranges of Vladislavleva-5

comment:14 Changed 11 years ago by gkronber

r9013: fixed range for Vladislavleva-1 as suggested on the GP-benchmarks mailing list.

comment:15 Changed 11 years ago by gkronber

r9091: added a new artificial benchmark problem for regression specifically for testing feature selection algorithms

... and reverted in r9092 (will be tracked in: 1999)

Version 1, edited 11 years ago by gkronber (previous) (next) (diff)

comment:16 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

Please review r9007:9008 and r9013 (r9091 has been reverted) I'm still waiting for the original authors answer on the Vladislavleva-5 ranges, but as this is only a notational difference (the data being equal) I think we can leave it this way.

Last edited 11 years ago by gkronber (previous) (diff)

comment:17 Changed 11 years ago by gkronber

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

comment:18 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

comment:19 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

comment:20 Changed 11 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.7 to 3.3.8
Note: See TracTickets for help on using tickets.