Opened 8 years ago

Closed 22 months ago

Last modified 12 months ago

#1973 closed enhancement (done)

Support more than 256 variables in linear regression models

Reported by: swinkler Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Algorithms.DataAnalysis Version: trunk
Keywords: Cc: sschalle

Description (last modified by gkronber)

Linear regression crashes if more than 256 input features are to be used. (The number of subtrees of the tree representing the linear model is too big.)

Attachments (1)

LR Interpreter Test Script.hl (1.9 KB) - added by mkommend 2 years ago.
Test Script

Download all attachments as: .zip

Change History (13)

comment:1 Changed 8 years ago by gkronber

  • Description modified (diff)
  • Summary changed from Problem with linear regression if number of variables > 256 to Linear regression models with more than 256 variables are not supported

I'm tempted to reject this ticket, as it is unlikely that this would produce an useful solution. If we allow more than 256 variables I'd prefer that we produce a different kind of model (no symbolic expression tree) in such cases.

For me the actual issue is that we do not have standard feature selection methods for LR or regularized linear models.

More discussion is needed before we decide on the further steps.

comment:2 Changed 7 years ago by gkronber

Ticket on LR with feature selection: #745.

comment:3 Changed 2 years ago by mkommend

  • Status changed from new to accepted

Changed 2 years ago by mkommend

Test Script

comment:4 Changed 2 years ago by mkommend

r15766: Adapted symbolic expression tree compilers to allow more than 256 child nodes.

The issue is not that the used LR (ALGLIB) does not allow more than 256 variables, but that our symbolic models only allow 256 child nodes in an expression tree. LR models are transformed into an symbolic model by adding all features as variables below an addition and hence this error occurred.

comment:5 Changed 2 years ago by mkommend

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.16
  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing
  • Version changed from 3.3.7 to trunk

Please review the changes in r15766 carefully as this affect many data analysis parts of HeuristicLab.

Furthermore, I have no idea how this adaptations affect the performance of the interpreters, but I suspect / hope not much.

comment:6 Changed 2 years ago by gkronber

The test run times on the builder do not show a direct effect of r15766.

comment:7 Changed 2 years ago by gkronber

Reviewed r15766. Looks good.

comment:8 Changed 2 years ago by gkronber

  • Status changed from reviewing to readytorelease

comment:9 Changed 2 years ago by gkronber

r15836: merged r15766 from trunk to stable.

comment:10 Changed 22 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed

comment:11 Changed 12 months ago by abeham

  • Type changed from defect to enhancement

comment:12 Changed 12 months ago by abeham

  • Summary changed from Linear regression models with more than 256 variables are not supported to Support more than 256 variables in linear regression models
Note: See TracTickets for help on using tickets.