Opened 6 years ago

Closed 4 years ago

#1292 closed enhancement (done)

Show correlation of dataset features as HeatMap

Reported by: mkommend Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.8
Component: Problems.DataAnalysis.Views Version: 3.3.8
Keywords: Cc:

Description


Change History (53)

comment:1 Changed 6 years ago by mkommend

  • Status changed from new to accepted

comment:2 Changed 6 years ago by gkronber

Please also add a view that shows the result of the Jarque-Bera test (normality test, implemented in alglib) for each variable.

comment:3 Changed 6 years ago by mkommend

  • Status changed from accepted to assigned

comment:4 Changed 6 years ago by swagner

  • Milestone changed from HeuristicLab 3.3.3 to HeuristicLab x.x.x

comment:5 Changed 5 years ago by mkommend

  • Owner changed from mkommend to sforsten

comment:6 Changed 5 years ago by gkronber

r7969: added HoeffdingsDependenceCalculator to calculate the non-parametric Hoeffding's dependency. Ideally it should be possible to show either Pearson's R², Spearman's rank correlation, or Hoeffding's dependency in the heat-map.

comment:7 Changed 5 years ago by sforsten

  • Status changed from assigned to accepted

r8034: create branch to show correlation of dataset features
r8035: branch project for implementing HeatMap to show correlation of dataset features
r8036: branch another project for implementing HeatMap to show correlation of dataset features

r8038:

  • completed branch creation
  • first simple implementation of a HeatMap, which shows the correlation of the dataset features

comment:8 Changed 5 years ago by sforsten

r8276:

  • merged r8034:8179 from trunk
  • added BackgroundWorker
  • added ProgressBar
  • added SpearmansRankCorrelationCoefficientCalculator
  • corrected bug in HoeffdingsDependenceCalculator
  • made some changes in the GUI

comment:9 Changed 5 years ago by gkronber

Please just use the alglib function for calculating the spearman's rank correlation Rename method 'Spear'

comment:10 Changed 5 years ago by sforsten

r8294:

  • SpearmansRankCorrelationCoefficientCalculator now uses the alglib function
  • strings in ExtendedHeatMap have been made constant

comment:11 Changed 5 years ago by sforsten

r8318:

  • added cloning method and constructor to ExtendedHeatMap
  • renamed a variable in ExtendedHeatMapView
  • added backwards compatibility code in DataAnalysisProblemData

comment:12 Changed 5 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from accepted to reviewing

comment:13 Changed 5 years ago by sforsten

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.8
  • Version changed from 3.3.2 to branch

comment:14 Changed 5 years ago by gkronber

  • Don't calculate the absolute value in Spearman's rank correlation.
  • Please add a property R or Correlation that simply returns the correlation coefficient in the Pearson's correlation calculator.

comment:15 Changed 5 years ago by mkommend

  • Owner changed from mkommend to sforsten
  • Status changed from reviewing to assigned

comment:16 Changed 5 years ago by gkronber

r8355:

  • fixed bugs in HoeffdingsDependenceCalculator
  • added test cases for HoeffdingsDependenceCalculator

comment:17 Changed 5 years ago by sforsten

r8483:

  • Renamed ExtendedHeatMap to FeatureCorrelation
  • deleted old CorrelationHeatMapView
  • added FeatureCorrelationView

comment:18 Changed 5 years ago by sforsten

r8492:

  • added TimeframeFeatureCorrelationView

comment:19 Changed 5 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from assigned to reviewing

comment:20 Changed 5 years ago by mkommend

r8525: Added bin directory and resharper files to list of SVN excluded files.

comment:21 Changed 5 years ago by mkommend

r8526: Corrected build configurations in DatasetCorrelation branch.

comment:22 Changed 5 years ago by sforsten

r8529:

  • BackgroundWorker is now reused in FeatureCorrelation
  • renamed some variables
  • ComboBoxes are now DropDownLists
  • FeatureCorrelation doesn't calculate the elements in the constructor anymore
  • small changes in the views

comment:23 Changed 5 years ago by mkommend

r8537: Improved drawing of feature correlation view.

comment:24 Changed 5 years ago by mkommend

r8538: Merged trunk changes in preparation of the branch reintegration.

comment:25 Changed 5 years ago by mkommend

r8542: Integrated correlation analysis of datasets in the trunk.

comment:26 Changed 5 years ago by mkommend

  • Owner changed from mkommend to sforsten
  • Status changed from reviewing to assigned
  • Version changed from branch to 3.3.8

The following things must be implemented:

  • Views of the same object is not synchronized
  • The default constructor doesn't assign a problem data to the feature correlation which could lead to exceptions
  • Use start and end values to calculate to correlation instead of strings declaring which partition should be used.
  • Remove the obsolete branch when all changes are implement.
Last edited 5 years ago by mkommend (previous) (diff)

comment:27 Changed 5 years ago by mkommend

r8543: Removed the feature correlation from the data analysis problem data as the implemenation is not yet finished and otherwise it could lead to persistence breaks.

comment:28 Changed 5 years ago by gkronber

r8559: removed the default constructor for FeatureCorrelation as it simply runs into a NullReferenceException (the default ctor is not used anywhere and is senseless).

This fixes the unit test fail for the meta-optimization branch on the builder.

comment:29 Changed 5 years ago by sforsten

r8578:

  • added ProblemDataView which has a button to open the feature correlation
  • added abstract base class for feature correlations
  • added caches for the feature correlation
  • created own class for calculation of feature correlation
  • changed SelectedItemChanged to SelectionChangeCommitted events, so the correlation is only calculated if the user changes the selection

comment:30 Changed 5 years ago by sforsten

  • Status changed from assigned to accepted

r8579: deleted obsolete branch

comment:31 Changed 5 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from accepted to reviewing

comment:32 Changed 5 years ago by sforsten

r8581: removed unnecessary reference

comment:33 follow-up: Changed 5 years ago by abeham

If possible, I suggest to limit the correlation analysis to only the allowed input variables plus the target variable. That way you can apply some filtering and it could help you iteratively refining your input variables.

comment:34 Changed 5 years ago by mkommend

The correlation analysis throws an exception if too few values were added to the used calculator.

comment:35 in reply to: ↑ 33 Changed 5 years ago by mkommend

Replying to abeham:

If possible, I suggest to limit the correlation analysis to only the allowed input variables plus the target variable. That way you can apply some filtering and it could help you iteratively refining your input variables.

This is a good point and should be implemented

comment:36 Changed 4 years ago by sforsten

r8689:

  • NaN values are used, if the calculation is invalid (e.g. missing values, infinity etc.)
  • Variables can now be filtered. Initially allowed input variables and target variable are shown, but with a right click a dialog can be opened to select variables, which shall be shown

comment:37 follow-up: Changed 4 years ago by abeham

I have a few remarks:

  • I would restrict Pearsons R2 to only use green-yellow-red colors. It's a bit confusing that in Pearsons R green means no correlation, but in R2 it means medium correlation while red still retains its meaning.
  • Hoeffdings Dependence doesn't have 1s in the diagonal (why?)
  • Numbers are not easily readable if they're on dark-blue background

comment:38 in reply to: ↑ 37 Changed 4 years ago by gkronber

Replying to abeham:

I have a few remarks:

  • Hoeffdings Dependence doesn't have 1s in the diagonal (why?)

This is correct behaviour when the variable contains duplicate values.

comment:39 Changed 4 years ago by mkommend

r8728: Corrected SpearmansRankCalculator.

comment:40 Changed 4 years ago by mkommend

r8729: Moved FeatureCorrelation specific classes from Problems.DataAnalysis to Problems.DataAnalysis.Views.

comment:41 Changed 4 years ago by mkommend

  • Owner changed from mkommend to sforsten
  • Status changed from reviewing to assigned

Currently the the TimeFrameCorrelationView is displayed by default instead of the "normal" CorrelationView. Furthermore we should discuss the source code in detail.

comment:42 Changed 4 years ago by sforsten

  • Status changed from assigned to accepted
  • Caches shall be directly in the (Timeframe-)FeatureCorrelationView
  • Caches should use Tuple<T1, T2> instead of nested dictionaries
  • AbstractFeatureCorrelationView shall inherit from StringConvertibleMatrixView to reduce code duplication
  • Replace HeatMap variable 'currentCorrelation' with double[,]
Last edited 4 years ago by sforsten (previous) (diff)

comment:43 Changed 4 years ago by sforsten

  • add a text box to TimeframeCorrelationView to input how many time frames shall be calculated (remove combo box which is currently used)

comment:44 Changed 4 years ago by sforsten

  • add an interface for dependency correlation calculators

comment:45 Changed 4 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from accepted to reviewing

r8833:

  • removed combo box in TimeframeCorrelationView and added a textbox instead
  • caches are directly in (Timeframe-)FeatureCorrelationView
  • caches use Tuple<> instead of nested dictionaries
  • a control EnhancedStringConvertibleMatrix inherits from StringConvertibleMatrixView to reduce code duplication
  • add interface IDependencyCalculator to several calculators
  • fixed bug: a previous started calculation is cancelled, if a new calculation shall be started and the values are already in the cache
  • fixed bug: if the content is changed, the calculation is cancelled

HeatMap is still used for the dependency representation, because a class is needed which implements IStringConvertibleMatrix and it has a maximum and minimum value.

comment:46 Changed 4 years ago by sforsten

r8834: changed CalculateHoeffdingsDTest due to a change in the name of a static method

comment:47 Changed 4 years ago by sforsten

r8861:

  • put IDependencyCalculators in own directory
  • changed DoubleRange Interval to double Minimum\Maximum in IDependencyCalculator
  • AbstractFeatureCorrelationView now uses DoubleMatrix instead of HeatMap

comment:48 Changed 4 years ago by sforsten

r8870:

  • corrected displaying of the Text of StringConvertibleMatrixVisibilityDialog and its subclasses
  • corrected sorting of EnhancedStringConvertibleMatrixView

comment:49 Changed 4 years ago by mkommend

  • Owner changed from mkommend to sforsten
  • Status changed from reviewing to assigned

Reviewing comments:

  • Unify names of controls that are used in views (e.g., CorrelationCalcLabel, minimumLabel, HeatMapProgressBar)
  • TimeFrameFeatureCorrelationView:
    • TimeFrameTextBox should check for valid values in the validating event
    • The correlation should be recalculated after the validated event of the TimeFrameTextBox was triggered. Currently it does not react on TAB.
  • Move partitions from the FeatureCorrelationHelper to the AbstractFeatureCorrelationView and use indexes in the bwInfo.
  • Remove *.resx files from repository.
  • Add missing license headers (.designer files).
  • Show / Hide rows throws an exception during dialog creation for the symbolic classification example.
  • The selected variable for the timeframe correlation should be the target variable per default.
Last edited 4 years ago by mkommend (previous) (diff)

comment:50 Changed 4 years ago by mkommend

r8874: Minor code cleanup in feature correlation classes.

comment:51 Changed 4 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from assigned to reviewing

r8880: implemented changes suggested by mkommend in comment:49:ticket:1292
r8881: renamed some controls

Also corrected DeregisterContentEvents in AbstractFeatureCorrelationView. AbstractFeatureCorrelationView.resx can't be removed, because it contains an image of the possible colors for the correlation.

Last edited 4 years ago by sforsten (previous) (diff)

comment:52 Changed 4 years ago by mkommend

  • Status changed from reviewing to readytorelease

comment:53 Changed 4 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.