Opened 4 years ago

Closed 2 years ago

#2031 closed enhancement (done)

Implement views for statistical hypothesis testing

Reported by: ascheibe Owned by: ascheibe
Priority: medium Milestone: HeuristicLab 3.3.11
Component: Analysis Version: 3.3.10
Keywords: Cc:

Description (last modified by ascheibe)

Besides using the charts of the RunCollectionView for analyzing runs there should also be the option to do statistical significance testing. Therefore new RunCollection views should be implemented that offer this functionality.

TODO: Check http://powerandsamplesize.com for a better way to do sample size estimation.

Change History (90)

comment:1 Changed 4 years ago by ascheibe

  • Status changed from new to accepted

comment:2 Changed 4 years ago by ascheibe

r9351 added branch for Statistical Hypothesis Testing

comment:3 Changed 4 years ago by ascheibe

r9353 initial import of views for statistical testing

Last edited 4 years ago by ascheibe (previous) (diff)

comment:4 Changed 4 years ago by ascheibe

r9355 added a view that shows boxplots for different sample sizes

comment:5 Changed 4 years ago by ascheibe

r9377 added sorting for the chart analysis view

comment:6 Changed 4 years ago by ascheibe

r9378 fixed some bugs and the coloring in the chart analysis view

comment:7 Changed 4 years ago by ascheibe

r9380 use EnhancedStringConvertibleMatrixView instead of StringConvertibleMatrixView in ResultCorrelationView

comment:8 Changed 4 years ago by ascheibe

r9382 the ResultCorrelationView can now also handle parameters and IntValues

comment:9 Changed 4 years ago by ascheibe

r9383 ResultCorrelationView is now sortable and some minor cosmetic improvements

comment:10 Changed 4 years ago by ascheibe

r9383 reorganized statistical testing ui

comment:11 Changed 4 years ago by ascheibe

r9398 fixed a bug in the correlation view

comment:12 Changed 4 years ago by ascheibe

r9706

  • added exponential fitting
  • added logarithmic fitting
  • refactored fitting code
  • updated license headers

comment:13 Changed 4 years ago by ascheibe

r9712 removed deprecated view

comment:14 Changed 4 years ago by ascheibe

r9713

  • made operations in ChartAnalysisView asynchronous
  • renamed views to go along with the other RunCollection views

comment:15 Changed 4 years ago by ascheibe

r9717 ported changes from the boxplot view from r9435 to the sample size influence view

comment:16 Changed 4 years ago by ascheibe

r9721 added unit tests for the Kruskal Wallis test

comment:17 Changed 4 years ago by ascheibe

r9742 added a rich text box dialog for displaying formatted help texts

comment:18 Changed 4 years ago by ascheibe

r9749

  • added documentation for chart analysis view
  • some ui improvements

comment:19 Changed 4 years ago by ascheibe

r9759 some minor changes

comment:20 Changed 4 years ago by ascheibe

r9813 fixed a bug when calculating the percentage of equal groups

comment:21 Changed 4 years ago by ascheibe

r9904 updated chart analysis view to work with new progress handling

comment:22 Changed 4 years ago by ascheibe

r9908 improved chart analysis view

comment:23 Changed 4 years ago by ascheibe

r9909 removed useless split container from chart analysis view

comment:24 Changed 4 years ago by ascheibe

r9911

  • renamed statistical run collection views
  • implemented RunCollection events in statistical testing view
  • incorporate content name into view caption

comment:25 Changed 4 years ago by ascheibe

r9912 some minor UI improvements

comment:26 Changed 4 years ago by ascheibe

r9913

  • redesigned statistical testing view
  • improved sample size influence view

comment:27 Changed 4 years ago by ascheibe

r9914 implemented rest of the reviewing comments for the statistical testing view

comment:28 Changed 4 years ago by ascheibe

r9917 adapted views to new help system

comment:29 Changed 4 years ago by ascheibe

r9922 adapted views to changes in help system

comment:30 Changed 4 years ago by ascheibe

r9923 fixed cross threading errors

comment:31 Changed 4 years ago by ascheibe

r9925 fixed size of controls in chart analysis view

comment:32 Changed 4 years ago by ascheibe

r9936 try to filter NaN values in correlation view and some other minor improvements

comment:33 Changed 4 years ago by ascheibe

r9937

  • added an histogram to the StatisticalTestingView
  • don't allow group sizes smaller than 6

comment:34 Changed 4 years ago by ascheibe

r9950 added Bonferroni-Holm adjusted p-values to the statistical testing view

comment:35 Changed 4 years ago by ascheibe

r9951 updated unit tests

comment:36 Changed 4 years ago by ascheibe

r9957 fixed pairwise tests

comment:37 Changed 4 years ago by ascheibe

r9968 fixed a bug in the SampleSizeInfluenceView

comment:38 Changed 4 years ago by ascheibe

r9998 added confidence intervals to sample size influence view

comment:39 Changed 4 years ago by ascheibe

r10016 added recommended sample size to sample size influence view

comment:40 Changed 4 years ago by ascheibe

r10017 improved sample size estimation and some minor improvements

comment:41 Changed 4 years ago by ascheibe

r10018 improved UI of statistical testing view

comment:42 Changed 4 years ago by ascheibe

  • Owner changed from ascheibe to abeham
  • Status changed from accepted to reviewing

Some things to note:

  • I have implemented most of your reviewing comments where it was possible (e.g. the charting changes were not possible).
  • At the moment this is one big plugin. On trunk integration, the views should go to Optimization.Views, the icon to Common.Resources and the additional EnumerableStatisticsExtensions to Common. If the rest of the code is an own plugin or should also go to Common is something we should discuss.
  • I have implemented the estimation of sample sizes as described in several sources. There are 2 types (one "normal" and one for large sample sizes). In my opinion the first one always underestimates the necessary sample sizes, the other one always overestimates sample sizes. I think the problem (though this seems to be no problem in textbooks) is that we do not have an estimator for the variance and confidence intervals. I will have a look into this but I still want to give you the ticket now as everything else is ready for review.
Last edited 2 years ago by ascheibe (previous) (diff)

comment:43 Changed 4 years ago by ascheibe

r10019 added missing static modifiers and removed unused variable

comment:44 Changed 3 years ago by ascheibe

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.10

comment:45 Changed 3 years ago by ascheibe

  • Milestone changed from HeuristicLab 3.3.10 to HeuristicLab 3.3.11

I'm moving this to 3.3.11 as it won't be ready for the next release.

comment:46 Changed 3 years ago by abeham

I just took a quick look at this. Generally, I'd say this is quite a good addition. Very nice workflow.

  • The yellow exclamation mark icon should have a tooltip that explains it. I see "Data is normally distributed: !". It seems that the exclamation marks says this is not the case, but at what significance level?
  • p-values should be rounded to 4 places after the point. Otherwise the text box might be too small to fit the whole number: Often the important part (exponent) is at the end. Instead of 2.93559215344167E-05 which would be chopped off to e.g. "2.9355921534416..." it should say 2.9356E-05. The text boxes should be at least as wide to allow these 10 characters to fit nicely.
  • The histogram didn't look to be so useful, maybe a stacked histogram would be more useful with common bins for all
  • The p-value adjustment needs more explanation. It seems that you have used Bonferroni correction for the post-hoc analysis. In the Mann-Whitney-U test most got adjusted from 0.0001 to 0.0007 as you would expect comparing to 7 other groups, but one adjusted p-value remained the same (0.00423...) and another one was adjusted from 0.00025.. to 0.0007. Above all, it seems strange to obtain such p-Values, it rather seems as if this was the significance level.
  • I don't see a possibility to set the significance level.
Last edited 2 years ago by ascheibe (previous) (diff)

comment:47 Changed 3 years ago by ascheibe

  • Owner changed from abeham to ascheibe
  • Status changed from reviewing to assigned

comment:48 Changed 3 years ago by ascheibe

r11368 adapted sample size view to the changes of #2120

comment:49 Changed 3 years ago by ascheibe

r11375 updated license headers and version

comment:50 Changed 3 years ago by ascheibe

r11376 fixed RunCollection event handling in ChartAnalysisView

comment:51 Changed 3 years ago by ascheibe

  • Description modified (diff)

comment:52 Changed 3 years ago by ascheibe

r11378 improved correlation view and some clean ups

comment:53 Changed 3 years ago by ascheibe

r11379 fixed some bugs

comment:54 follow-up: Changed 2 years ago by ascheibe

mkommends review comments:

  • Statistical Testing View:
    • Check if icons are correct. The label should also change according to the icon and should be also displayed for the tests and not only for normal distribution test
    • Add links in help view that describe the used algorithms in detail
    • Adjusted p-values are sometimes equal to p-values. Check if this is correct
    • t-Test values should be gray when used with data that is not normally distributed
    • For the normal distribution check, a check mark should only be given if all are normally distributed
    • Inherit StringConvertibleMatrixView and fix the column with or adjust to length of values
    • If an error occurs and the "not enough samples" dialog is shown, the loading bar is not hidden afterwards
    • If possible, a kernel density curve should also be displayed in the histogram
    • Check t-test for one/two sided
  • Correlations View:
    • Should display the values as matrix, the method should be chooseable
    • Sort inside results and parameters
  • Chart Analysis View:
    • Percentiles seem to be switched
    • Change Relativ Error to Average Relative Error
    • log values should be removed from data table
    • avg/upper/lower values should be descriped in the help text

  • Sample Size Influence View
    • add help text
    • used draw without putting back from StatisticsEnumerable

  • On trunk integration, move algorithms to Analysis plugin, create an own plugin for the views
Last edited 2 years ago by ascheibe (previous) (diff)

comment:55 Changed 2 years ago by ascheibe

r11601

  • fixed column width of p-values
  • started working on drawing a normal distribution over the histogram

comment:56 Changed 2 years ago by ascheibe

r11611 adapted statistical testing view to new histogram

comment:57 Changed 2 years ago by ascheibe

r11612 fixed a bug in the progress handling of the statistical testing view

comment:58 Changed 2 years ago by ascheibe

r11625 switched statistical testing branch to .NET 4.5

comment:59 Changed 2 years ago by ascheibe

r11644 implemented review comments for correlations view

comment:60 Changed 2 years ago by ascheibe

r11665 implemented review comments for chart analysis view

comment:61 Changed 2 years ago by ascheibe

r11670 worked on sample size influence view

comment:62 Changed 2 years ago by ascheibe

r11671 added documentation for sample size influence view

comment:63 Changed 2 years ago by ascheibe

r11673 fixed a small bug in Bonferroni-Holm adjustment and added more unit tests for it

comment:64 Changed 2 years ago by ascheibe

r11691

  • expanded documentation for statistical testing view
  • changed t-test to unpooled method
  • removed sample size determination for t-test as this is probably not correct

comment:65 Changed 2 years ago by ascheibe

r11692

  • fixed a bug in Cohens d / Hedges g calculation
  • fixed calculation of pairwise tests (no more columns with only zeroes)
  • some refactoring

comment:66 Changed 2 years ago by ascheibe

r11693 added dialog for configuring the SignificanceLevel and renamed view

comment:67 Changed 2 years ago by ascheibe

r11695 added more information to the UI about the results of the statistical tests

comment:68 in reply to: ↑ 54 Changed 2 years ago by ascheibe

Replying to ascheibe:

I did not do graying out the values of the t-test if the data is not normally distributed. It seems that if you have bigger sample sizes it is not such a problem. See http://stats.stackexchange.com/questions/9573/t-test-for-non-normal-when-n50

mkommends review comments: ...

Last edited 2 years ago by ascheibe (previous) (diff)

comment:69 Changed 2 years ago by ascheibe

r11696

  • improved code of statistical testing view
  • improved documentation

comment:70 Changed 2 years ago by ascheibe

r11697 improved code of views

comment:71 Changed 2 years ago by ascheibe

r11699

  • fixed confidence intervals calculation
  • added more unit tests
  • some cosmetic changes

comment:72 Changed 2 years ago by ascheibe

r11702 added unit tests for effect size measures

comment:73 Changed 2 years ago by ascheibe

Trunk integration

r11703 moved statistic algs and unit tests to trunk

Last edited 2 years ago by ascheibe (previous) (diff)

comment:74 Changed 2 years ago by ascheibe

r11704 moved annotations-default icon to common.resources

comment:75 Changed 2 years ago by ascheibe

r11705

  • moved statistics views to a new plugin (HL.Analysis.Statistics.Views) in trunk
  • fixed namespaces of unit tests

comment:76 Changed 2 years ago by ascheibe

r11706 deleted statistical testing branch

comment:77 Changed 2 years ago by ascheibe

r11715 made it possible to set axis in boxplot view and enabled functionality in statistical tests view for showing data in the box plot

comment:78 Changed 2 years ago by ascheibe

r11717 fixed layout of statistical test view

comment:79 Changed 2 years ago by ascheibe

r11725 added statistical views project to project dependencies for HeuristicLab-3.3

comment:80 Changed 2 years ago by ascheibe

  • Owner changed from ascheibe to mkommend
  • Status changed from assigned to reviewing

comment:81 Changed 2 years ago by ascheibe

  • Version changed from branch to 3.3.10

comment:82 Changed 2 years ago by ascheibe

r11757 added svn:ignore properties for statistics.views

comment:83 Changed 2 years ago by ascheibe

  • Description modified (diff)

comment:84 Changed 2 years ago by ascheibe

r11837

  • fixed a problem where an exception was thrown when the same progress bar was removed multiple times
  • remove data points from histogram before adding new ones

comment:85 Changed 2 years ago by mkommend

Review started at r11703 (trunk integration).

Review comments:

  • Why are the views in an separate plugin, but the other classes in HL.Analysis.
    • I just found that we can't put them into Analysis.Views because of a dependency on the Optimization.Views plugin. Optimization.Views also has a reference on Analysis.Views, so we would get a cyclic dependency. And I don't want to put the views into Optimization.Views because of the alglib dependency. So I will leave this for now in an own plugin.
  • EnumberableStatistics should be located in HL.Common
    • Not possible due to alglib reference.
  • ConfidenceIntervals iterates 4x over the values !!!
  • BonferroniHolm uses elementAt extensively. A list or an array would be more appropriate instead of IEnumerable.
  • Move fittings to an extra subfolder.
  • I don't like the name fitting at all.
  • SampleSizeDetermination does not use the passed confidence for interval calculation
  • I don't understand the difference between the two methods for sample size estimation and when to use which
Last edited 2 years ago by ascheibe (previous) (diff)

comment:86 Changed 2 years ago by ascheibe

  • Owner changed from mkommend to ascheibe
  • Status changed from reviewing to assigned

comment:87 Changed 2 years ago by ascheibe

Also:

  • fix name handling of DataRow
  • rename linear fitting into slope and intercept
  • improve IFitting interface

comment:88 Changed 2 years ago by ascheibe

  • Owner changed from ascheibe to mkommend
  • Status changed from assigned to reviewing

r11914 implemented review comments

comment:89 Changed 2 years ago by mkommend

  • Owner changed from mkommend to ascheibe
  • Status changed from reviewing to readytorelease

comment:90 Changed 2 years ago by ascheibe

  • Resolution set to done
  • Status changed from readytorelease to closed

r11919: merged revisions 11703,11704,11705,11706,11715,11717,11725,11757,11837,11914 into stable

Note: See TracTickets for help on using tickets.