Opened 15 years ago
Closed 14 years ago
#939 closed feature request (done)
Data types and operators for classification problems
Reported by: | gkronber | Owned by: | swagner |
---|---|---|---|
Priority: | high | Milestone: | HeuristicLab 3.3.2 |
Component: | ZZZ OBSOLETE: Problems.DataAnalysis.Classification | Version: | 3.3.2 |
Keywords: | Cc: |
Description
Attachments (2)
Change History (66)
comment:1 Changed 15 years ago by gkronber
- Version changed from 3.2 to 3.3
comment:2 Changed 14 years ago by gkronber
- Version changed from 3.3 to 3.3.1
comment:3 Changed 14 years ago by gkronber
- Owner changed from gkronber to mkommend
comment:4 Changed 14 years ago by mkommend
- Priority changed from major to critical
comment:5 Changed 14 years ago by mkommend
- Status changed from new to accepted
comment:6 Changed 14 years ago by mkommend
comment:7 Changed 14 years ago by mkommend
Corrected classification plugin infrastructure with r4304.
comment:8 Changed 14 years ago by mkommend
Updated classification branch with r4323.
comment:9 Changed 14 years ago by mkommend
Created branch of HeuristicLab.Problems.DataAnalysis with r4324.
comment:10 Changed 14 years ago by mkommend
Corrected probject references of HeuristicLab.Problems.DataAnalysis with r4325.
comment:11 Changed 14 years ago by mkommend
Added draft version of classification with r4366.
comment:12 Changed 14 years ago by mkommend
Added classification views r4367.
comment:13 Changed 14 years ago by mkommend
Updated classification branch with r4391.
comment:14 Changed 14 years ago by mkommend
Added SymbolicClassificationPearsonRSquaredEvaluator with r4392.
comment:15 Changed 14 years ago by mkommend
Corrected DataAnalysisProblemData ctor with r4393.
comment:16 Changed 14 years ago by mkommend
Updated classification views with r4394.
comment:17 Changed 14 years ago by mkommend
Deleted outdated plugin Problems.Classification.Views with r4395.
comment:18 Changed 14 years ago by mkommend
Adapted regression classes to work with classification plugin with r4415.
comment:19 Changed 14 years ago by mkommend
Adapted classification analyzer and readded views project with r4417.
comment:20 Changed 14 years ago by mkommend
Removed accidentally commited plugin file with r4450.
comment:21 Changed 14 years ago by mkommend
Marked SymbolicClassificationProblem as StorableContent with r4452.
comment:22 Changed 14 years ago by mkommend
Added logic to remove the test samples from the training samples with r4469.
comment:23 Changed 14 years ago by mkommend
- Version changed from 3.3.1 to branch
comment:24 Changed 14 years ago by mkommend
- Version changed from branch to 3.3.1
Moved DataAnalysis.Classification from branch to trunk with r4565.
comment:25 Changed 14 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from accepted to reviewing
comment:26 Changed 14 years ago by mkommend
- Owner changed from gkronber to mkommend
- Status changed from reviewing to assigned
comment:27 Changed 14 years ago by mkommend
- Status changed from assigned to accepted
comment:28 Changed 14 years ago by mkommend
Corrected handling of ClassNames in the ClassificationProblemData with r4618.
comment:29 Changed 14 years ago by mkommend
- Status changed from accepted to reviewing
comment:30 Changed 14 years ago by mkommend
- Status changed from reviewing to assigned
comment:31 Changed 14 years ago by mkommend
- Status changed from assigned to accepted
comment:32 Changed 14 years ago by mkommend
- Status changed from accepted to reviewing
Deleted classification branch with r4706.
comment:33 Changed 14 years ago by mkommend
- Owner changed from mkommend to gkronber
comment:34 follow-up: ↓ 42 Changed 14 years ago by gkronber
Misclassification matrix is not regenerated when other parameters for the classification problem are changed (e.g. the target variable).
comment:35 follow-up: ↓ 43 Changed 14 years ago by gkronber
List of class names is not regenerated when other parameters of the classification problem are changed (e.g. the target variable, or data set partition).
comment:36 follow-up: ↓ 44 Changed 14 years ago by gkronber
TestSamplesStart parameter value is not adapted correctly after importing a new dataset.
comment:37 follow-up: ↓ 46 Changed 14 years ago by gkronber
In the validation analyzer the accuracy of the validation-best model is calculated. This can lead to an exception when the model produces NaN estimations.
In symbolic regression NaN values are replaced by the upper estimation bound before accuracy metrics are calculated. I recommend a similar scheme for classification.
OperatorExecutionException: An exception was thrown by the operator "ValidationBestSymbolicClassificationSolutionAnalyzer": Accuracy is not defined for NaN or infinity elements ----- ArgumentException: Accuracy is not defined for NaN or infinity elements at HeuristicLab.Problems.DataAnalysis.Classification.OnlineAccuracyEvaluator.Add(Double original, Double estimated) at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.UpdateBestSolutionResults() at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.Apply() at HeuristicLab.Operators.Operator.Execute(IExecutionContext context) at HeuristicLab.SequentialEngine.SequentialEngine.ProcessNextOperation()
comment:38 Changed 14 years ago by gkronber
Calculation or display of the ROC curve is very slow for slightly larger datasets (>2000 samples).
comment:39 Changed 14 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from reviewing to assigned
Please reassign me the ticket for review when the issues discussed above have been addressed. Thanks.
comment:40 Changed 14 years ago by mkommend
- Status changed from assigned to accepted
comment:41 Changed 14 years ago by mkommend
Corrected updating of ClassificationProblemData parameters after data import with r4780.
comment:42 in reply to: ↑ 34 Changed 14 years ago by mkommend
comment:43 in reply to: ↑ 35 Changed 14 years ago by mkommend
comment:44 in reply to: ↑ 36 Changed 14 years ago by mkommend
comment:45 Changed 14 years ago by mkommend
Corrected adaption of the test range with r4781.
comment:46 in reply to: ↑ 37 Changed 14 years ago by mkommend
Replying to gkronber:
In the validation analyzer the accuracy of the validation-best model is calculated. This can leads to an exception when the model produces NaN estimations.
In symbolic regression NaN values are replaced by the upper estimation bound before accuracy metrics are calculated. I recommend a similar scheme for classification.
OperatorExecutionException: An exception was thrown by the operator "ValidationBestSymbolicClassificationSolutionAnalyzer": Accuracy is not defined for NaN or infinity elements ----- ArgumentException: Accuracy is not defined for NaN or infinity elements at HeuristicLab.Problems.DataAnalysis.Classification.OnlineAccuracyEvaluator.Add(Double original, Double estimated) at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.UpdateBestSolutionResults() at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.Apply() at HeuristicLab.Operators.Operator.Execute(IExecutionContext context) at HeuristicLab.SequentialEngine.SequentialEngine.ProcessNextOperation()
Could not reproduce this bug and also checked the source code if !NaN elements are filtered. Could you attach a sample or test againg of the bug still exists.
comment:47 Changed 14 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from accepted to reviewing
Corrected SymbolicRegressionSolution to use UpperEstimationLimit instead double.NaN with r4797.
comment:48 Changed 14 years ago by swagner
- Milestone changed from HeuristicLab x.x.x to HeuristicLab 3.3.2
comment:49 follow-up: ↓ 57 Changed 14 years ago by gkronber
The order of variables occurring in the target variable drop down box does not match the order of variables in the original dataset. This also has the effect that a random variable is initially selected as the target variable.
comment:50 Changed 14 years ago by gkronber
For binary classification problems the AUC for the two classes, displayed in the legend tooltip in the ROC curve view does not match. Shouldn't the two AUC values be the same when only two classes are available?
comment:51 Changed 14 years ago by gkronber
A demo for classification would be nice.
comment:52 follow-up: ↓ 56 Changed 14 years ago by gkronber
Probably the attached file would be a nice demo.
comment:53 follow-up: ↓ 59 Changed 14 years ago by gkronber
In the symbolic classification view, the class visibility is reset when the content is set. This is a bit annoying.
comment:54 Changed 14 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from reviewing to assigned
comment:55 Changed 14 years ago by gkronber
The automatic calculation of the separator value for the discriminating function produces strange results (see attached screenshot).
Changed 14 years ago by gkronber
comment:56 in reply to: ↑ 52 Changed 14 years ago by swagner
Replying to gkronber:
Probably the attached file would be a nice demo.
Please feel free to add this sample to the samples in the optimizer.
comment:57 in reply to: ↑ 49 Changed 14 years ago by mkommend
Replying to gkronber:
The order of variables occurring in the target variable drop down box does not match the order of variables in the original dataset. This also has the effect that a random variable is initially selected as the target variable.
Corrected ordering of valid target variables with r4836.
comment:58 Changed 14 years ago by mkommend
Corrected ValidationBestSymbolicClassificationSolutionAnalyzer with r4837.
comment:59 in reply to: ↑ 53 Changed 14 years ago by mkommend
Replying to gkronber:
In the symbolic classification view, the class visibility is reset when the content is set. This is a bit annoying.
Although this is annoying there is nothing we can do against it, because the view must be reset if a new content is set. There was also a bug in the solution analyzer that got fixed with r4837 and now the best solution is updated correctly and less often => the reset occurs less frequent.
comment:60 Changed 14 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from assigned to reviewing
comment:61 Changed 14 years ago by gkronber
- Owner changed from gkronber to mkommend
I really don't like the random order of variables in the target variable drop down. Also as soon as there are binary input variables the target variable selection heuristic fails.
However, this is only my subjective opinion and this issue does not necessarily have to be fixed for the 3.3.2 release. (fixed in r4836)
Everything else looks great, thanks for implementing this plugin!
Feel free to release the plugin and close this ticket.
comment:62 Changed 14 years ago by gkronber
Updated symbolic classification with SGP demo with r4884.
comment:63 Changed 14 years ago by gkronber
- Owner changed from mkommend to swagner
- Status changed from reviewing to readytorelease
comment:64 Changed 14 years ago by swagner
- Resolution set to done
- Status changed from readytorelease to closed
- Version changed from 3.3.1 to 3.3.2
Added branch for classification with r4303.