Opened 14 years ago
Closed 12 years ago
#1292 closed enhancement (done)
Show correlation of dataset features as HeatMap
Reported by: | mkommend | Owned by: | mkommend |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.8 |
Component: | Problems.DataAnalysis.Views | Version: | 3.3.8 |
Keywords: | Cc: |
Description
Change History (53)
comment:1 Changed 14 years ago by mkommend
- Status changed from new to accepted
comment:2 Changed 14 years ago by gkronber
comment:3 Changed 14 years ago by mkommend
- Status changed from accepted to assigned
comment:4 Changed 14 years ago by swagner
- Milestone changed from HeuristicLab 3.3.3 to HeuristicLab x.x.x
comment:5 Changed 12 years ago by mkommend
- Owner changed from mkommend to sforsten
comment:6 Changed 12 years ago by gkronber
r7969: added HoeffdingsDependenceCalculator to calculate the non-parametric Hoeffding's dependency. Ideally it should be possible to show either Pearson's R², Spearman's rank correlation, or Hoeffding's dependency in the heat-map.
comment:7 Changed 12 years ago by sforsten
- Status changed from assigned to accepted
r8034: create branch to show correlation of dataset features
r8035: branch project for implementing HeatMap to show correlation of dataset features
r8036: branch another project for implementing HeatMap to show correlation of dataset features
- completed branch creation
- first simple implementation of a HeatMap, which shows the correlation of the dataset features
comment:8 Changed 12 years ago by sforsten
- merged r8034:8179 from trunk
- added BackgroundWorker
- added ProgressBar
- added SpearmansRankCorrelationCoefficientCalculator
- corrected bug in HoeffdingsDependenceCalculator
- made some changes in the GUI
comment:9 Changed 12 years ago by gkronber
Please just use the alglib function for calculating the spearman's rank correlation Rename method 'Spear'
comment:10 Changed 12 years ago by sforsten
- SpearmansRankCorrelationCoefficientCalculator now uses the alglib function
- strings in ExtendedHeatMap have been made constant
comment:11 Changed 12 years ago by sforsten
- added cloning method and constructor to ExtendedHeatMap
- renamed a variable in ExtendedHeatMapView
- added backwards compatibility code in DataAnalysisProblemData
comment:12 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from accepted to reviewing
comment:13 Changed 12 years ago by sforsten
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.8
- Version changed from 3.3.2 to branch
comment:14 Changed 12 years ago by gkronber
- Don't calculate the absolute value in Spearman's rank correlation.
- Please add a property R or Correlation that simply returns the correlation coefficient in the Pearson's correlation calculator.
comment:15 Changed 12 years ago by mkommend
- Owner changed from mkommend to sforsten
- Status changed from reviewing to assigned
comment:16 Changed 12 years ago by gkronber
- fixed bugs in HoeffdingsDependenceCalculator
- added test cases for HoeffdingsDependenceCalculator
comment:17 Changed 12 years ago by sforsten
- Renamed ExtendedHeatMap to FeatureCorrelation
- deleted old CorrelationHeatMapView
- added FeatureCorrelationView
comment:18 Changed 12 years ago by sforsten
- added TimeframeFeatureCorrelationView
comment:19 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from assigned to reviewing
comment:20 Changed 12 years ago by mkommend
r8525: Added bin directory and resharper files to list of SVN excluded files.
comment:21 Changed 12 years ago by mkommend
r8526: Corrected build configurations in DatasetCorrelation branch.
comment:22 Changed 12 years ago by sforsten
- BackgroundWorker is now reused in FeatureCorrelation
- renamed some variables
- ComboBoxes are now DropDownLists
- FeatureCorrelation doesn't calculate the elements in the constructor anymore
- small changes in the views
comment:23 Changed 12 years ago by mkommend
r8537: Improved drawing of feature correlation view.
comment:24 Changed 12 years ago by mkommend
r8538: Merged trunk changes in preparation of the branch reintegration.
comment:25 Changed 12 years ago by mkommend
r8542: Integrated correlation analysis of datasets in the trunk.
comment:26 Changed 12 years ago by mkommend
- Owner changed from mkommend to sforsten
- Status changed from reviewing to assigned
- Version changed from branch to 3.3.8
The following things must be implemented:
- Views of the same object is not synchronized
- The default constructor doesn't assign a problem data to the feature correlation which could lead to exceptions
- Use start and end values to calculate to correlation instead of strings declaring which partition should be used.
- Remove the obsolete branch when all changes are implement.
comment:27 Changed 12 years ago by mkommend
r8543: Removed the feature correlation from the data analysis problem data as the implemenation is not yet finished and otherwise it could lead to persistence breaks.
comment:28 Changed 12 years ago by gkronber
r8559: removed the default constructor for FeatureCorrelation as it simply runs into a NullReferenceException (the default ctor is not used anywhere and is senseless).
This fixes the unit test fail for the meta-optimization branch on the builder.
comment:29 Changed 12 years ago by sforsten
- added ProblemDataView which has a button to open the feature correlation
- added abstract base class for feature correlations
- added caches for the feature correlation
- created own class for calculation of feature correlation
- changed SelectedItemChanged to SelectionChangeCommitted events, so the correlation is only calculated if the user changes the selection
comment:30 Changed 12 years ago by sforsten
- Status changed from assigned to accepted
r8579: deleted obsolete branch
comment:31 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from accepted to reviewing
comment:32 Changed 12 years ago by sforsten
r8581: removed unnecessary reference
comment:33 follow-up: ↓ 35 Changed 12 years ago by abeham
If possible, I suggest to limit the correlation analysis to only the allowed input variables plus the target variable. That way you can apply some filtering and it could help you iteratively refining your input variables.
comment:34 Changed 12 years ago by mkommend
The correlation analysis throws an exception if too few values were added to the used calculator.
comment:35 in reply to: ↑ 33 Changed 12 years ago by mkommend
Replying to abeham:
If possible, I suggest to limit the correlation analysis to only the allowed input variables plus the target variable. That way you can apply some filtering and it could help you iteratively refining your input variables.
This is a good point and should be implemented
comment:36 Changed 12 years ago by sforsten
- NaN values are used, if the calculation is invalid (e.g. missing values, infinity etc.)
- Variables can now be filtered. Initially allowed input variables and target variable are shown, but with a right click a dialog can be opened to select variables, which shall be shown
comment:37 follow-up: ↓ 38 Changed 12 years ago by abeham
I have a few remarks:
- I would restrict Pearsons R2 to only use green-yellow-red colors. It's a bit confusing that in Pearsons R green means no correlation, but in R2 it means medium correlation while red still retains its meaning.
- Hoeffdings Dependence doesn't have 1s in the diagonal (why?)
- Numbers are not easily readable if they're on dark-blue background
comment:38 in reply to: ↑ 37 Changed 12 years ago by gkronber
Replying to abeham:
I have a few remarks:
- Hoeffdings Dependence doesn't have 1s in the diagonal (why?)
This is correct behaviour when the variable contains duplicate values.
comment:39 Changed 12 years ago by mkommend
r8728: Corrected SpearmansRankCalculator.
comment:40 Changed 12 years ago by mkommend
r8729: Moved FeatureCorrelation specific classes from Problems.DataAnalysis to Problems.DataAnalysis.Views.
comment:41 Changed 12 years ago by mkommend
- Owner changed from mkommend to sforsten
- Status changed from reviewing to assigned
Currently the the TimeFrameCorrelationView is displayed by default instead of the "normal" CorrelationView. Furthermore we should discuss the source code in detail.
comment:42 Changed 12 years ago by sforsten
- Status changed from assigned to accepted
- Caches shall be directly in the (Timeframe-)FeatureCorrelationView
- Caches should use KeyValuePair instead of nested dictionaries
- AbstractFeatureCorrelationView shall inherit from StringConvertibleMatrixView to reduce code duplication
- Replace HeatMap variable 'currentCorrelation' with double[,]
comment:43 Changed 12 years ago by sforsten
- add a text box to TimeframeCorrelationView to input how many time frames shall be calculated (remove combo box which is currently used)
comment:44 Changed 12 years ago by sforsten
- add an interface for dependency correlation calculators
comment:45 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from accepted to reviewing
- removed combo box in TimeframeCorrelationView and added a textbox instead
- caches are directly in (Timeframe-)FeatureCorrelationView
- caches use Tuple<> instead of nested dictionaries
- a control EnhancedStringConvertibleMatrix inherits from StringConvertibleMatrixView to reduce code duplication
- add interface IDependencyCalculator to several calculators
- fixed bug: a previous started calculation is cancelled, if a new calculation shall be started and the values are already in the cache
- fixed bug: if the content is changed, the calculation is cancelled
HeatMap is still used for the dependency representation, because a class is needed which implements IStringConvertibleMatrix and it has a maximum and minimum value.
comment:46 Changed 12 years ago by sforsten
r8834: changed CalculateHoeffdingsDTest due to a change in the name of a static method
comment:47 Changed 12 years ago by sforsten
- put IDependencyCalculators in own directory
- changed DoubleRange Interval to double Minimum\Maximum in IDependencyCalculator
- AbstractFeatureCorrelationView now uses DoubleMatrix instead of HeatMap
comment:48 Changed 12 years ago by sforsten
- corrected displaying of the Text of StringConvertibleMatrixVisibilityDialog and its subclasses
- corrected sorting of EnhancedStringConvertibleMatrixView
comment:49 Changed 12 years ago by mkommend
- Owner changed from mkommend to sforsten
- Status changed from reviewing to assigned
Reviewing comments:
- Unify names of controls that are used in views (e.g., CorrelationCalcLabel, minimumLabel, HeatMapProgressBar)
- TimeFrameFeatureCorrelationView:
- TimeFrameTextBox should check for valid values in the validating event
- The correlation should be recalculated after the validated event of the TimeFrameTextBox was triggered. Currently it does not react on TAB.
- Move partitions from the FeatureCorrelationHelper to the AbstractFeatureCorrelationView and use indexes in the bwInfo.
- Remove *.resx files from repository.
- Add missing license headers (.designer files).
- Show / Hide rows throws an exception during dialog creation for the symbolic classification example.
- The selected variable for the timeframe correlation should be the target variable per default.
comment:50 Changed 12 years ago by mkommend
r8874: Minor code cleanup in feature correlation classes.
comment:51 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from assigned to reviewing
r8880: implemented changes suggested by mkommend in comment:49:ticket:1292
r8881: renamed some controls
Also corrected DeregisterContentEvents in AbstractFeatureCorrelationView. AbstractFeatureCorrelationView.resx can't be removed, because it contains an image of the possible colors for the correlation.
comment:52 Changed 12 years ago by mkommend
- Status changed from reviewing to readytorelease
comment:53 Changed 12 years ago by swagner
- Resolution set to done
- Status changed from readytorelease to closed
Please also add a view that shows the result of the Jarque-Bera test (normality test, implemented in alglib) for each variable.