Opened 8 years ago
Closed 7 years ago
#2760 closed feature request (done)
Shuffle samples in the cross-validation wrapper for data analysis algorithms
Reported by: | bburlacu | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.15 |
Component: | Algorithms.DataAnalysis | Version: | 3.3.14 |
Keywords: | Cc: |
Description
The cross-validation wrapper should offer an option to shuffle the data samples.
Change History (22)
comment:1 Changed 8 years ago by bburlacu
- Owner set to bburlacu
- Status changed from new to accepted
comment:2 Changed 8 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from accepted to reviewing
comment:3 Changed 8 years ago by bburlacu
r14865: Fix issue with resources in CrossValidationView.Designer.cs
comment:4 Changed 8 years ago by gkronber
It seems that in the ensemble the information wether a point was used for training or test is not stored correctly. Reproduce:
- Use cross-validation with shuffling and produce an overfit model on purpose.
- Check line chart
- Expected result: errors for training predictions (yellow) are very small, errors for test predictions (red) are significantly higher.
- Actual result: some errors for training predictions are also high, some errors for test points are suspiciously small.
comment:5 Changed 8 years ago by bburlacu
r14904: Reuse the shuffled data when creating the solution ensemble.
comment:6 Changed 8 years ago by gkronber
comment:7 Changed 8 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
Review comments:
- Backwards compatibility is not ensured
- Shuffling can be changed during execution yielding inconsistent results
- Clone shows wrong value of shuffle samples in view
- Shuffled problemData is neither cloned nor serialized
Why do we need the shuffledProblemData at all?
comment:8 Changed 7 years ago by bburlacu
r15002: Got rid of the shuffledProblemData by using a shared seed for all the folds (so that the dataset for each fold is shuffled in exactly the same way). Backwards compatibility should be restored. Shuffling cannot be changed during algorithm execution, cloning also clones the checked value for the shuffled checkbox.
comment:9 Changed 7 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:10 Changed 7 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
This ticket broke the backwards compatibility for CrossValidation (probably due to the shuffle sample flag).
comment:11 Changed 7 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
r15026: Ensure that the shuffleSamples flag is initialized after deserialization.
comment:12 Changed 7 years ago by mkommend
r15077: Reordered backwards compatibility and event registration in after deserialization hook of CrossValidation.
comment:13 Changed 7 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
Review comments
CrossValidationView
- The ShuffleSamples checkbox should be checked / unchecked in OnContentChanged and not SetEnabledStateOfControls
- I would enable the ShuffleSamples checkbox only if the CrossValidation is prepared.
CrossValidation
- Why do extract and aggregate regression / classification solution work differently (clone of problemData)?
comment:14 Changed 7 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
r15111: Set check state of the ShuffleSamples checkbox inthe OnContentChanged method. Enable the checkbox only when the CrossValidation is prepared.
Regarding the different way of cloning the classification solution: this is done differently to account for a special use case when the GBT algorithm with the logistic regression loss function returns a regression solution (from which a new classification solution is built).
comment:15 Changed 7 years ago by mkommend
Reviewed r15111.
comment:16 Changed 7 years ago by mkommend
- Status changed from reviewing to readytorelease
comment:17 Changed 7 years ago by gkronber
- Owner changed from mkommend to gkronber
- Status changed from readytorelease to assigned
comment:18 Changed 7 years ago by gkronber
- Status changed from assigned to accepted
comment:19 Changed 7 years ago by gkronber
- Status changed from accepted to readytorelease
comment:20 Changed 7 years ago by gkronber
comment:21 Changed 7 years ago by gkronber
Depends on #2723.
comment:22 Changed 7 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
r14864: Implement shuffling of crossvalidation samples.