Opened 11 years ago

Closed 10 years ago

#939 closed feature request (done)

Data types and operators for classification problems

Reported by: gkronber Owned by: swagner
Priority: high Milestone: HeuristicLab 3.3.2
Component: ZZZ OBSOLETE: Problems.DataAnalysis.Classification Version: 3.3.2
Keywords: Cc:

Description


Attachments (2)

SGP-SymbolicClassification-Mammographic.hl (44.0 KB) - added by gkronber 11 years ago.
SGP symbolic classification - mammographic dataset (UCI)
strange_separator.png (64.1 KB) - added by gkronber 11 years ago.

Download all attachments as: .zip

Change History (66)

comment:1 Changed 11 years ago by gkronber

  • Version changed from 3.2 to 3.3

comment:2 Changed 11 years ago by gkronber

  • Version changed from 3.3 to 3.3.1

comment:3 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend

comment:4 Changed 11 years ago by mkommend

  • Priority changed from major to critical

comment:5 Changed 11 years ago by mkommend

  • Status changed from new to accepted

comment:6 Changed 11 years ago by mkommend

Added branch for classification with r4303.

Last edited 11 years ago by mkommend (previous) (diff)

comment:7 Changed 11 years ago by mkommend

Corrected classification plugin infrastructure with r4304.

comment:8 Changed 11 years ago by mkommend

Updated classification branch with r4323.

comment:9 Changed 11 years ago by mkommend

Created branch of HeuristicLab.Problems.DataAnalysis with r4324.

comment:10 Changed 11 years ago by mkommend

Corrected probject references of HeuristicLab.Problems.DataAnalysis with r4325.

comment:11 Changed 11 years ago by mkommend

Added draft version of classification with r4366.

comment:12 Changed 11 years ago by mkommend

Added classification views r4367.

comment:13 Changed 11 years ago by mkommend

Updated classification branch with r4391.

comment:14 Changed 11 years ago by mkommend

Added SymbolicClassificationPearsonRSquaredEvaluator with r4392.

comment:15 Changed 11 years ago by mkommend

Corrected DataAnalysisProblemData ctor with r4393.

comment:16 Changed 11 years ago by mkommend

Updated classification views with r4394.

comment:17 Changed 11 years ago by mkommend

Deleted outdated plugin Problems.Classification.Views with r4395.

comment:18 Changed 11 years ago by mkommend

Adapted regression classes to work with classification plugin with r4415.

comment:19 Changed 11 years ago by mkommend

Adapted classification analyzer and readded views project with r4417.

comment:20 Changed 11 years ago by mkommend

Removed accidentally commited plugin file with r4450.

comment:21 Changed 11 years ago by mkommend

Marked SymbolicClassificationProblem as StorableContent with r4452.

Last edited 11 years ago by mkommend (previous) (diff)

comment:22 Changed 11 years ago by mkommend

Added logic to remove the test samples from the training samples with r4469.

comment:23 Changed 11 years ago by mkommend

  • Version changed from 3.3.1 to branch

comment:24 Changed 11 years ago by mkommend

  • Version changed from branch to 3.3.1

Moved DataAnalysis.Classification from branch to trunk with r4565.

comment:25 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing

comment:26 Changed 11 years ago by mkommend

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to assigned

comment:27 Changed 11 years ago by mkommend

  • Status changed from assigned to accepted

comment:28 Changed 11 years ago by mkommend

Corrected handling of ClassNames in the ClassificationProblemData with r4618.

comment:29 Changed 11 years ago by mkommend

  • Status changed from accepted to reviewing

comment:30 Changed 11 years ago by mkommend

  • Status changed from reviewing to assigned

comment:31 Changed 11 years ago by mkommend

  • Status changed from assigned to accepted

comment:32 Changed 11 years ago by mkommend

  • Status changed from accepted to reviewing

Deleted classification branch with r4706.

comment:33 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber

comment:34 follow-up: Changed 11 years ago by gkronber

Misclassification matrix is not regenerated when other parameters for the classification problem are changed (e.g. the target variable).

comment:35 follow-up: Changed 11 years ago by gkronber

List of class names is not regenerated when other parameters of the classification problem are changed (e.g. the target variable, or data set partition).

comment:36 follow-up: Changed 11 years ago by gkronber

TestSamplesStart parameter value is not adapted correctly after importing a new dataset.

Last edited 11 years ago by gkronber (previous) (diff)

comment:37 follow-up: Changed 11 years ago by gkronber

In the validation analyzer the accuracy of the validation-best model is calculated. This can lead to an exception when the model produces NaN estimations.

In symbolic regression NaN values are replaced by the upper estimation bound before accuracy metrics are calculated. I recommend a similar scheme for classification.

OperatorExecutionException: An exception was thrown by the operator "ValidationBestSymbolicClassificationSolutionAnalyzer": Accuracy is not defined for NaN or infinity elements

-----
ArgumentException: Accuracy is not defined for NaN or infinity elements
   at HeuristicLab.Problems.DataAnalysis.Classification.OnlineAccuracyEvaluator.Add(Double original, Double estimated)
   at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.UpdateBestSolutionResults()
   at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.Apply()
   at HeuristicLab.Operators.Operator.Execute(IExecutionContext context)
   at HeuristicLab.SequentialEngine.SequentialEngine.ProcessNextOperation()

Last edited 10 years ago by gkronber (previous) (diff)

comment:38 Changed 11 years ago by gkronber

Calculation or display of the ROC curve is very slow for slightly larger datasets (>2000 samples).

comment:39 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to assigned

Please reassign me the ticket for review when the issues discussed above have been addressed. Thanks.

comment:40 Changed 11 years ago by mkommend

  • Status changed from assigned to accepted

comment:41 Changed 11 years ago by mkommend

Corrected updating of ClassificationProblemData parameters after data import with r4780.

comment:42 in reply to: ↑ 34 Changed 11 years ago by mkommend

Replying to gkronber:

Misclassification matrix is not regenerated when other parameters for the classification problem are changed (e.g. the target variable).

Corrected with r4780.

comment:43 in reply to: ↑ 35 Changed 11 years ago by mkommend

Replying to gkronber:

List of class names is not regenerated when other parameters of the classification problem are changed (e.g. the target variable, or data set partition).

Corrected with r4780.

comment:44 in reply to: ↑ 36 Changed 11 years ago by mkommend

Replying to gkronber:

TestSamplesStart parameter value is not adapted correctly after importing a new dataset.

Corrected with r4780.

comment:45 Changed 11 years ago by mkommend

Corrected adaption of the test range with r4781.

comment:46 in reply to: ↑ 37 Changed 11 years ago by mkommend

Replying to gkronber:

In the validation analyzer the accuracy of the validation-best model is calculated. This can leads to an exception when the model produces NaN estimations.

In symbolic regression NaN values are replaced by the upper estimation bound before accuracy metrics are calculated. I recommend a similar scheme for classification.

OperatorExecutionException: An exception was thrown by the operator "ValidationBestSymbolicClassificationSolutionAnalyzer": Accuracy is not defined for NaN or infinity elements

-----
ArgumentException: Accuracy is not defined for NaN or infinity elements
   at HeuristicLab.Problems.DataAnalysis.Classification.OnlineAccuracyEvaluator.Add(Double original, Double estimated)
   at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.UpdateBestSolutionResults()
   at HeuristicLab.Problems.DataAnalysis.Classification.ValidationBestSymbolicClassificationSolutionAnalyzer.Apply()
   at HeuristicLab.Operators.Operator.Execute(IExecutionContext context)
   at HeuristicLab.SequentialEngine.SequentialEngine.ProcessNextOperation()

Could not reproduce this bug and also checked the source code if !NaN elements are filtered. Could you attach a sample or test againg of the bug still exists.

comment:47 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing

Corrected SymbolicRegressionSolution to use UpperEstimationLimit instead double.NaN with r4797.

comment:48 Changed 11 years ago by swagner

  • Milestone changed from HeuristicLab x.x.x to HeuristicLab 3.3.2

comment:49 follow-up: Changed 11 years ago by gkronber

The order of variables occurring in the target variable drop down box does not match the order of variables in the original dataset. This also has the effect that a random variable is initially selected as the target variable.

comment:50 Changed 11 years ago by gkronber

For binary classification problems the AUC for the two classes, displayed in the legend tooltip in the ROC curve view does not match. Shouldn't the two AUC values be the same when only two classes are available?

comment:51 Changed 11 years ago by gkronber

A demo for classification would be nice.

comment:52 follow-up: Changed 11 years ago by gkronber

Probably the attached file would be a nice demo.

Changed 11 years ago by gkronber

SGP symbolic classification - mammographic dataset (UCI)

comment:53 follow-up: Changed 11 years ago by gkronber

In the symbolic classification view, the class visibility is reset when the content is set. This is a bit annoying.

comment:54 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to assigned

comment:55 Changed 11 years ago by gkronber

The automatic calculation of the separator value for the discriminating function produces strange results (see attached screenshot).

Changed 11 years ago by gkronber

comment:56 in reply to: ↑ 52 Changed 11 years ago by swagner

Replying to gkronber:

Probably the attached file would be a nice demo.

Please feel free to add this sample to the samples in the optimizer.

comment:57 in reply to: ↑ 49 Changed 10 years ago by mkommend

Replying to gkronber:

The order of variables occurring in the target variable drop down box does not match the order of variables in the original dataset. This also has the effect that a random variable is initially selected as the target variable.

Corrected ordering of valid target variables with r4836.

comment:58 Changed 10 years ago by mkommend

Corrected ValidationBestSymbolicClassificationSolutionAnalyzer with r4837.

comment:59 in reply to: ↑ 53 Changed 10 years ago by mkommend

Replying to gkronber:

In the symbolic classification view, the class visibility is reset when the content is set. This is a bit annoying.

Although this is annoying there is nothing we can do against it, because the view must be reset if a new content is set. There was also a bug in the solution analyzer that got fixed with r4837 and now the best solution is updated correctly and less often => the reset occurs less frequent.

comment:60 Changed 10 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from assigned to reviewing

comment:61 Changed 10 years ago by gkronber

  • Owner changed from gkronber to mkommend

I really don't like the random order of variables in the target variable drop down. Also as soon as there are binary input variables the target variable selection heuristic fails.

However, this is only my subjective opinion and this issue does not necessarily have to be fixed for the 3.3.2 release. (fixed in r4836)

Everything else looks great, thanks for implementing this plugin!

Feel free to release the plugin and close this ticket.

Last edited 10 years ago by gkronber (previous) (diff)

comment:62 Changed 10 years ago by gkronber

Updated symbolic classification with SGP demo with r4884.

comment:63 Changed 10 years ago by gkronber

  • Owner changed from mkommend to swagner
  • Status changed from reviewing to readytorelease

comment:64 Changed 10 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.1 to 3.3.2
Note: See TracTickets for help on using tickets.