Opened 7 years ago

Closed 11 months ago

Last modified 7 weeks ago

#1973 closed enhancement (done)

Support more than 256 variables in linear regression models

Reported by: swinkler Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Algorithms.DataAnalysis Version: trunk
Keywords: Cc: sschalle

Description (last modified by gkronber)

Linear regression crashes if more than 256 input features are to be used. (The number of subtrees of the tree representing the linear model is too big.)

Attachments (1)

LR Interpreter Test Script.hl (1.9 KB) - added by mkommend 19 months ago.
Test Script

Download all attachments as: .zip

Change History (13)

comment:1 Changed 7 years ago by gkronber

  • Description modified (diff)
  • Summary changed from Problem with linear regression if number of variables > 256 to Linear regression models with more than 256 variables are not supported

I'm tempted to reject this ticket, as it is unlikely that this would produce an useful solution. If we allow more than 256 variables I'd prefer that we produce a different kind of model (no symbolic expression tree) in such cases.

For me the actual issue is that we do not have standard feature selection methods for LR or regularized linear models.

More discussion is needed before we decide on the further steps.

comment:2 Changed 6 years ago by gkronber

Ticket on LR with feature selection: #745.

comment:3 Changed 19 months ago by mkommend

  • Status changed from new to accepted

Changed 19 months ago by mkommend

Test Script

comment:4 Changed 19 months ago by mkommend

r15766: Adapted symbolic expression tree compilers to allow more than 256 child nodes.

The issue is not that the used LR (ALGLIB) does not allow more than 256 variables, but that our symbolic models only allow 256 child nodes in an expression tree. LR models are transformed into an symbolic model by adding all features as variables below an addition and hence this error occurred.

comment:5 Changed 19 months ago by mkommend

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.16
  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing
  • Version changed from 3.3.7 to trunk

Please review the changes in r15766 carefully as this affect many data analysis parts of HeuristicLab.

Furthermore, I have no idea how this adaptations affect the performance of the interpreters, but I suspect / hope not much.

comment:6 Changed 18 months ago by gkronber

The test run times on the builder do not show a direct effect of r15766.

comment:7 Changed 18 months ago by gkronber

Reviewed r15766. Looks good.

comment:8 Changed 18 months ago by gkronber

  • Status changed from reviewing to readytorelease

comment:9 Changed 18 months ago by gkronber

r15836: merged r15766 from trunk to stable.

comment:10 Changed 11 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed

comment:11 Changed 7 weeks ago by abeham

  • Type changed from defect to enhancement

comment:12 Changed 7 weeks ago by abeham

  • Summary changed from Linear regression models with more than 256 variables are not supported to Support more than 256 variables in linear regression models
Note: See TracTickets for help on using tickets.