Opened 4 years ago

Closed 4 years ago

#1999 closed feature request (done)

Regression problem instances for testing feature selection

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.8
Component: Problems.Instances Version: 3.3.8
Keywords: Cc:

Description


Change History (9)

comment:1 Changed 4 years ago by gkronber

r9093 added a provider and a configurable problem instance for testing feature selection

comment:2 Changed 4 years ago by gkronber

r9094: formatting

comment:3 Changed 4 years ago by gkronber

  • Status changed from new to accepted

comment:4 Changed 4 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:5 follow-up: Changed 4 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

Reviewing comments:

  • Use Linq syntax in FeatureSelectionInstanceProvider.GetDataDescriptors as it is IMHO more readable.
  • Make the training and test samples configurable in the FeatureSelection DataDescriptor.
  • Additionally the ranges for the input variables and weights should also be configurable.
  • Why are the generated input values normally (0,1) and not uniformly distributed?
  • Obviously the formula to calculate the sigma for the noise RNG (targetSigma * Math.Sqrt(noiseRatio)) works, but I don't understand why? Is this due to the fact that targetSigma is approximately 1.0?
  • The final formula y = f(x,w)+e with the selected variables and the according weights should be displayed somewhere (i.e. problem description, problem data parameter).

comment:6 in reply to: ↑ 5 Changed 4 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

r9217: improved implementation of feature selection problem instances based on the review comments by mkommend.

  • Created a PRNG for uniformly distributed values with a specified range [min..max[
  • Created a class FeatureSelectionRegressionProblemData with additional informative parameters derived from RegressionProblemData
  • fixed typos: shuffeled and varialbe

Replying to mkommend:

Reviewing comments:

  • Use Linq syntax in FeatureSelectionInstanceProvider.GetDataDescriptors as it is IMHO more readable.

Fixed in r9217

  • Make the training and test samples configurable in the FeatureSelection DataDescriptor.

Fixed in r9217 (default values training: 20% more than number of features, test: 5000)

  • Additionally the ranges for the input variables and weights should also be configurable.

Fixed in r9217 by adding IRandom parameters to generate the values. Default values x: Normal(0,1) and weights: Uniform(0,10)

  • Why are the generated input values normally (0,1) and not uniformly distributed?
  • Obviously the formula to calculate the sigma for the noise RNG (targetSigma * Math.Sqrt(noiseRatio)) works, but I don't understand why? Is this due to the fact that targetSigma is approximately 1.0?

Last two points were discussed personally.

  • The final formula y = f(x,w)+e with the selected variables and the according weights should be displayed somewhere (i.e. problem description, problem data parameter).

Added informative parameters for selected features, weights, and best achievable R² in the ProblemData parameter (introduced a new derived class with the additional parameters)

Please review the changes again.

comment:7 Changed 4 years ago by gkronber

r9218: fixed build fail

comment:8 Changed 4 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

r9231: Corrected rounding in FeatureSelection.cs for the display of the optimal R² in the problem data description.

Reviewed r9217 and found no other mistake as the one fixed with r9231.

comment:9 Changed 4 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.7 to 3.3.8
Note: See TracTickets for help on using tickets.