Opened 5 years ago
Closed 3 years ago
#2031 closed enhancement (done)
Implement views for statistical hypothesis testing
Reported by: | ascheibe | Owned by: | ascheibe |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.11 |
Component: | Analysis | Version: | 3.3.10 |
Keywords: | Cc: |
Description (last modified by ascheibe)
Besides using the charts of the RunCollectionView for analyzing runs there should also be the option to do statistical significance testing. Therefore new RunCollection views should be implemented that offer this functionality.
TODO: Check http://powerandsamplesize.com for a better way to do sample size estimation.
Change History (90)
comment:1 Changed 5 years ago by ascheibe
- Status changed from new to accepted
comment:2 Changed 5 years ago by ascheibe
comment:3 Changed 5 years ago by ascheibe
r9353 initial import of views for statistical testing
comment:4 Changed 5 years ago by ascheibe
r9355 added a view that shows boxplots for different sample sizes
comment:5 Changed 5 years ago by ascheibe
r9377 added sorting for the chart analysis view
comment:6 Changed 5 years ago by ascheibe
r9378 fixed some bugs and the coloring in the chart analysis view
comment:7 Changed 5 years ago by ascheibe
r9380 use EnhancedStringConvertibleMatrixView instead of StringConvertibleMatrixView in ResultCorrelationView
comment:8 Changed 5 years ago by ascheibe
r9382 the ResultCorrelationView can now also handle parameters and IntValues
comment:9 Changed 5 years ago by ascheibe
r9383 ResultCorrelationView is now sortable and some minor cosmetic improvements
comment:10 Changed 4 years ago by ascheibe
r9383 reorganized statistical testing ui
comment:11 Changed 4 years ago by ascheibe
r9398 fixed a bug in the correlation view
comment:12 Changed 4 years ago by ascheibe
- added exponential fitting
- added logarithmic fitting
- refactored fitting code
- updated license headers
comment:13 Changed 4 years ago by ascheibe
r9712 removed deprecated view
comment:14 Changed 4 years ago by ascheibe
- made operations in ChartAnalysisView asynchronous
- renamed views to go along with the other RunCollection views
comment:15 Changed 4 years ago by ascheibe
comment:16 Changed 4 years ago by ascheibe
r9721 added unit tests for the Kruskal Wallis test
comment:17 Changed 4 years ago by ascheibe
r9742 added a rich text box dialog for displaying formatted help texts
comment:18 Changed 4 years ago by ascheibe
- added documentation for chart analysis view
- some ui improvements
comment:19 Changed 4 years ago by ascheibe
r9759 some minor changes
comment:20 Changed 4 years ago by ascheibe
r9813 fixed a bug when calculating the percentage of equal groups
comment:21 Changed 4 years ago by ascheibe
r9904 updated chart analysis view to work with new progress handling
comment:22 Changed 4 years ago by ascheibe
r9908 improved chart analysis view
comment:23 Changed 4 years ago by ascheibe
r9909 removed useless split container from chart analysis view
comment:24 Changed 4 years ago by ascheibe
- renamed statistical run collection views
- implemented RunCollection events in statistical testing view
- incorporate content name into view caption
comment:25 Changed 4 years ago by ascheibe
r9912 some minor UI improvements
comment:26 Changed 4 years ago by ascheibe
- redesigned statistical testing view
- improved sample size influence view
comment:27 Changed 4 years ago by ascheibe
r9914 implemented rest of the reviewing comments for the statistical testing view
comment:28 Changed 4 years ago by ascheibe
r9917 adapted views to new help system
comment:29 Changed 4 years ago by ascheibe
r9922 adapted views to changes in help system
comment:30 Changed 4 years ago by ascheibe
r9923 fixed cross threading errors
comment:31 Changed 4 years ago by ascheibe
r9925 fixed size of controls in chart analysis view
comment:32 Changed 4 years ago by ascheibe
r9936 try to filter NaN values in correlation view and some other minor improvements
comment:33 Changed 4 years ago by ascheibe
- added an histogram to the StatisticalTestingView
- don't allow group sizes smaller than 6
comment:34 Changed 4 years ago by ascheibe
r9950 added Bonferroni-Holm adjusted p-values to the statistical testing view
comment:35 Changed 4 years ago by ascheibe
r9951 updated unit tests
comment:36 Changed 4 years ago by ascheibe
r9957 fixed pairwise tests
comment:37 Changed 4 years ago by ascheibe
r9968 fixed a bug in the SampleSizeInfluenceView
comment:38 Changed 4 years ago by ascheibe
r9998 added confidence intervals to sample size influence view
comment:39 Changed 4 years ago by ascheibe
r10016 added recommended sample size to sample size influence view
comment:40 Changed 4 years ago by ascheibe
r10017 improved sample size estimation and some minor improvements
comment:41 Changed 4 years ago by ascheibe
r10018 improved UI of statistical testing view
comment:42 Changed 4 years ago by ascheibe
- Owner changed from ascheibe to abeham
- Status changed from accepted to reviewing
Some things to note:
- I have implemented most of your reviewing comments where it was possible (
e.g. the charting changes were not possible). - At the moment this is one big plugin. On trunk integration, the views should go to Optimization.Views, the icon to Common.Resources and the additional EnumerableStatisticsExtensions to Common. If the rest of the code is an own plugin or should also go to Common is something we should discuss.
- I have implemented the estimation of sample sizes as described in several sources. There are 2 types (one "normal" and one for large sample sizes). In my opinion the first one always underestimates the necessary sample sizes, the other one always overestimates sample sizes. I think the problem (though this seems to be no problem in textbooks) is that we do not have an estimator for the variance and confidence intervals. I will have a look into this but I still want to give you the ticket now as everything else is ready for review.
comment:43 Changed 4 years ago by ascheibe
r10019 added missing static modifiers and removed unused variable
comment:44 Changed 4 years ago by ascheibe
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.10
comment:45 Changed 3 years ago by ascheibe
- Milestone changed from HeuristicLab 3.3.10 to HeuristicLab 3.3.11
I'm moving this to 3.3.11 as it won't be ready for the next release.
comment:46 Changed 3 years ago by abeham
I just took a quick look at this. Generally, I'd say this is quite a good addition. Very nice workflow.
The yellow exclamation mark icon should have a tooltip that explains it. I see "Data is normally distributed: !". It seems that the exclamation marks says this is not the case, but at what significance level?p-values should be rounded to 4 places after the point. Otherwise the text box might be too small to fit the whole number: Often the important part (exponent) is at the end. Instead of 2.93559215344167E-05 which would be chopped off to e.g. "2.9355921534416..." it should say 2.9356E-05. The text boxes should be at least as wide to allow these 10 characters to fit nicely.The histogram didn't look to be so useful, maybe a stacked histogram would be more useful with common bins for allThe p-value adjustment needs more explanation. It seems that you have used Bonferroni correction for the post-hoc analysis. In the Mann-Whitney-U test most got adjusted from 0.0001 to 0.0007 as you would expect comparing to 7 other groups, but one adjusted p-value remained the same (0.00423...) and another one was adjusted from 0.00025.. to 0.0007. Above all, it seems strange to obtain such p-Values, it rather seems as if this was the significance level.I don't see a possibility to set the significance level.
comment:47 Changed 3 years ago by ascheibe
- Owner changed from abeham to ascheibe
- Status changed from reviewing to assigned
comment:48 Changed 3 years ago by ascheibe
comment:49 Changed 3 years ago by ascheibe
r11375 updated license headers and version
comment:50 Changed 3 years ago by ascheibe
r11376 fixed RunCollection event handling in ChartAnalysisView
comment:51 Changed 3 years ago by ascheibe
- Description modified (diff)
comment:52 Changed 3 years ago by ascheibe
r11378 improved correlation view and some clean ups
comment:53 Changed 3 years ago by ascheibe
r11379 fixed some bugs
comment:54 follow-up: ↓ 68 Changed 3 years ago by ascheibe
mkommends review comments:
- Statistical Testing View:
Check if icons are correct. The label should also change according to the icon and should be also displayed for the tests and not only for normal distribution testAdd links in help view that describe the used algorithms in detailAdjusted p-values are sometimes equal to p-values. Check if this is correctt-Test values should be gray when used with data that is not normally distributedFor the normal distribution check, a check mark should only be given if all are normally distributedInherit StringConvertibleMatrixView and fix the column with or adjust to length of valuesIf an error occurs and the "not enough samples" dialog is shown, the loading bar is not hidden afterwardsIf possible, a kernel density curve should also be displayed in the histogramCheck t-test for one/two sided
Correlations View:Should display the values as matrix, the method should be chooseableSort inside results and parameters
Chart Analysis View:Percentiles seem to be switchedChange Relativ Error to Average Relative Errorlog values should be removed from data tableavg/upper/lower values should be descriped in the help text
Sample Size Influence Viewadd help textused draw without putting back from StatisticsEnumerable
- On trunk integration, move algorithms to Analysis plugin, create an own plugin for the views
comment:55 Changed 3 years ago by ascheibe
- fixed column width of p-values
- started working on drawing a normal distribution over the histogram
comment:56 Changed 3 years ago by ascheibe
r11611 adapted statistical testing view to new histogram
comment:57 Changed 3 years ago by ascheibe
r11612 fixed a bug in the progress handling of the statistical testing view
comment:58 Changed 3 years ago by ascheibe
r11625 switched statistical testing branch to .NET 4.5
comment:59 Changed 3 years ago by ascheibe
r11644 implemented review comments for correlations view
comment:60 Changed 3 years ago by ascheibe
r11665 implemented review comments for chart analysis view
comment:61 Changed 3 years ago by ascheibe
r11670 worked on sample size influence view
comment:62 Changed 3 years ago by ascheibe
r11671 added documentation for sample size influence view
comment:63 Changed 3 years ago by ascheibe
r11673 fixed a small bug in Bonferroni-Holm adjustment and added more unit tests for it
comment:64 Changed 3 years ago by ascheibe
- expanded documentation for statistical testing view
- changed t-test to unpooled method
- removed sample size determination for t-test as this is probably not correct
comment:65 Changed 3 years ago by ascheibe
- fixed a bug in Cohens d / Hedges g calculation
- fixed calculation of pairwise tests (no more columns with only zeroes)
- some refactoring
comment:66 Changed 3 years ago by ascheibe
r11693 added dialog for configuring the SignificanceLevel and renamed view
comment:67 Changed 3 years ago by ascheibe
r11695 added more information to the UI about the results of the statistical tests
comment:68 in reply to: ↑ 54 Changed 3 years ago by ascheibe
Replying to ascheibe:
I did not do graying out the values of the t-test if the data is not normally distributed. It seems that if you have bigger sample sizes it is not such a problem. See http://stats.stackexchange.com/questions/9573/t-test-for-non-normal-when-n50
mkommends review comments: ...
comment:69 Changed 3 years ago by ascheibe
- improved code of statistical testing view
- improved documentation
comment:70 Changed 3 years ago by ascheibe
r11697 improved code of views
comment:71 Changed 3 years ago by ascheibe
- fixed confidence intervals calculation
- added more unit tests
- some cosmetic changes
comment:72 Changed 3 years ago by ascheibe
r11702 added unit tests for effect size measures
comment:73 Changed 3 years ago by ascheibe
Trunk integration
r11703 moved statistic algs and unit tests to trunk
comment:74 Changed 3 years ago by ascheibe
r11704 moved annotations-default icon to common.resources
comment:75 Changed 3 years ago by ascheibe
- moved statistics views to a new plugin (HL.Analysis.Statistics.Views) in trunk
- fixed namespaces of unit tests
comment:76 Changed 3 years ago by ascheibe
r11706 deleted statistical testing branch
comment:77 Changed 3 years ago by ascheibe
r11715 made it possible to set axis in boxplot view and enabled functionality in statistical tests view for showing data in the box plot
comment:78 Changed 3 years ago by ascheibe
r11717 fixed layout of statistical test view
comment:79 Changed 3 years ago by ascheibe
r11725 added statistical views project to project dependencies for HeuristicLab-3.3
comment:80 Changed 3 years ago by ascheibe
- Owner changed from ascheibe to mkommend
- Status changed from assigned to reviewing
comment:81 Changed 3 years ago by ascheibe
- Version changed from branch to 3.3.10
comment:82 Changed 3 years ago by ascheibe
r11757 added svn:ignore properties for statistics.views
comment:83 Changed 3 years ago by ascheibe
- Description modified (diff)
comment:84 Changed 3 years ago by ascheibe
- fixed a problem where an exception was thrown when the same progress bar was removed multiple times
- remove data points from histogram before adding new ones
comment:85 Changed 3 years ago by mkommend
Review started at r11703 (trunk integration).
Review comments:
Why are the views in an separate plugin, but the other classes in HL.Analysis.- I just found that we can't put them into Analysis.Views because of a dependency on the Optimization.Views plugin. Optimization.Views also has a reference on Analysis.Views, so we would get a cyclic dependency. And I don't want to put the views into Optimization.Views because of the alglib dependency. So I will leave this for now in an own plugin.
EnumberableStatistics should be located in HL.Common- Not possible due to alglib reference.
- ConfidenceIntervals iterates 4x over the values !!!
- BonferroniHolm uses elementAt extensively. A list or an array would be more appropriate instead of IEnumerable.
- Move fittings to an extra subfolder.
- I don't like the name fitting at all.
- SampleSizeDetermination does not use the passed confidence for interval calculation
- I don't understand the difference between the two methods for sample size estimation and when to use which
comment:86 Changed 3 years ago by ascheibe
- Owner changed from mkommend to ascheibe
- Status changed from reviewing to assigned
comment:87 Changed 3 years ago by ascheibe
Also:
- fix name handling of DataRow
- rename linear fitting into slope and intercept
- improve IFitting interface
comment:88 Changed 3 years ago by ascheibe
- Owner changed from ascheibe to mkommend
- Status changed from assigned to reviewing
r11914 implemented review comments
comment:89 Changed 3 years ago by mkommend
- Owner changed from mkommend to ascheibe
- Status changed from reviewing to readytorelease
comment:90 Changed 3 years ago by ascheibe
- Resolution set to done
- Status changed from readytorelease to closed
r11919: merged revisions 11703,11704,11705,11706,11715,11717,11725,11757,11837,11914 into stable
r9351 added branch for Statistical Hypothesis Testing