#1973 closed enhancement (done)
Support more than 256 variables in linear regression models
Reported by: | swinkler | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.16 |
Component: | Algorithms.DataAnalysis | Version: | trunk |
Keywords: | Cc: | sschalle |
Description (last modified by gkronber)
Linear regression crashes if more than 256 input features are to be used. (The number of subtrees of the tree representing the linear model is too big.)
Attachments (1)
Change History (13)
comment:1 Changed 7 years ago by gkronber
- Description modified (diff)
- Summary changed from Problem with linear regression if number of variables > 256 to Linear regression models with more than 256 variables are not supported
comment:2 Changed 7 years ago by gkronber
Ticket on LR with feature selection: #745.
comment:3 Changed 2 years ago by mkommend
- Status changed from new to accepted
comment:4 Changed 2 years ago by mkommend
r15766: Adapted symbolic expression tree compilers to allow more than 256 child nodes.
The issue is not that the used LR (ALGLIB) does not allow more than 256 variables, but that our symbolic models only allow 256 child nodes in an expression tree. LR models are transformed into an symbolic model by adding all features as variables below an addition and hence this error occurred.
comment:5 Changed 2 years ago by mkommend
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.16
- Owner changed from mkommend to gkronber
- Status changed from accepted to reviewing
- Version changed from 3.3.7 to trunk
Please review the changes in r15766 carefully as this affect many data analysis parts of HeuristicLab.
Furthermore, I have no idea how this adaptations affect the performance of the interpreters, but I suspect / hope not much.
comment:6 Changed 2 years ago by gkronber
The test run times on the builder do not show a direct effect of r15766.
comment:7 Changed 2 years ago by gkronber
Reviewed r15766. Looks good.
comment:8 Changed 2 years ago by gkronber
- Status changed from reviewing to readytorelease
comment:9 Changed 2 years ago by gkronber
comment:10 Changed 19 months ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
comment:11 Changed 9 months ago by abeham
- Type changed from defect to enhancement
comment:12 Changed 9 months ago by abeham
- Summary changed from Linear regression models with more than 256 variables are not supported to Support more than 256 variables in linear regression models
I'm tempted to reject this ticket, as it is unlikely that this would produce an useful solution. If we allow more than 256 variables I'd prefer that we produce a different kind of model (no symbolic expression tree) in such cases.
For me the actual issue is that we do not have standard feature selection methods for LR or regularized linear models.
More discussion is needed before we decide on the further steps.