A page for collecting and discussing statistical analysis of metaheuristic optimization experiments.

In metaheuristic optimization experiments we measure the outcome of a stochastic process. From this measurements we hope to estimate the distribution mean of the outcome. The outcome usually is a certain quality or the time/iterations required to reach a certain quality. These outcomes are random variables with unknown distributions.

The goal in those experiments is to show that one process is able to achieve a better output than another process. There are several ways to show this:
 * Visual analysis methods
 * Statistical hypotheses tests for unequality of two means

== Statistical Background ==
 * [http://en.wikipedia.org/wiki/Behrens%E2%80%93Fisher_problem Behrens-Fischer Problem]

== Statistical Analysis Methods ==
 * Single comparison
  * [http://en.wikipedia.org/wiki/T-test t-test]
  * [http://en.wikipedia.org/wiki/Mann-Whitney_U_test Mann-Whitney U test]
  * [http://en.wikipedia.org/wiki/Welch%27s_t_test Welch's t-test]
 * Multiple comparison
  * [http://en.wikipedia.org/wiki/ANOVA ANOVA]
  * [http://en.wikipedia.org/wiki/Friedman_test Friedman test]
  * [http://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance Kruskal-Wallis]

== Visual Analysis Methods ==
 * In statistical analysis visual methods are often preferred. They can encode a lot more data and provide a clearer picture for expert human interpretation. Several possibilities exist, some have already been mentioned:

||= Name =||= Example =||= Purpose =||
|| Bubble charts || [[Image(http://upload.wikimedia.org/wikipedia/commons/thumb/8/83/3Variable_BubbleChart.svg/240px-3Variable_BubbleChart.svg.png)]] || Bubbles are fun! ||
|| Density plots / histograms || [[Image(http://upload.wikimedia.org/wikipedia/commons/thumb/3/37/P_glu_given_diabetes.png/360px-P_glu_given_diabetes.png)]] || Comparing/Estimating the exact shapes of the underlying distributions ||
|| Boxplots || [[Image(http://upload.wikimedia.org/wikipedia/commons/thumb/f/fa/Michelsonmorley-boxplot.svg/300px-Michelsonmorley-boxplot.svg.png)]] || Compact and efficient comparison of distributions ||
|| Forest plots || [[Image(http://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Generic_forest_plot.png/300px-Generic_forest_plot.png)]] || Comparing confidence intervals makes it probably a little more obvious if a result would be significant enough ||
|| Q-Q plots || [[Image(http://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Normal_normal_qq.svg/300px-Normal_normal_qq.svg.png)]] || Estimating whether data fits a certain distribution ||

== Possible Workflow / Methodology ==
 1. Testing data for e.g. normal distributions, equalness of variances, etc. to decide if parametric or non-parametric tests to apply
 1. In case of multiple comparisons perform ANOVA, Friedman or other tests otherwise perform single comparison t-test, Mann-Whitney, etc.
 1. In case multiple comparisons are significant use pairwise comparisons with post hoc analysis adjustments

One problem with such a workflow is that the error propagates in every step. So if testing for normal distributions has a certain error and ANOVA has a certain error, the final error would be a combination of the two. We should also identify and exclude cases where the application of tests would probably not be valid.

== Critique == 
 1. Steven Goodman. 2008. A Dirty Dozen: Twelve P-Value Misconceptions. Seminars in Hematology Volume 45, Issue 3, July 2008, Pages 135–140
 1. Jacob Cohen. 1994. The Earth is Round (p < 0.05). American Psycologist. http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf
 1. Hubbard, Bayarri. 2003. P-Values are not Error Probabilities. http://ftp.isds.duke.edu/WorkingPapers/03-26.pdf

== References ==
 * García, Fernández, Luengo, Herrera. 2010. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180, pp. 2044–2064. (http://sci2s.ugr.es/sicidm/pdf/2010-Garcia-INS.pdf)
 * Salvador García, Daniel Molina, Manuel Lozano, Francisco Herrera. 2009. A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization. (http://sci2s.ugr.es/publications/ficheros/2009-garcia-JH.pdf)

== External Libraries ==
* [http://www.meta-numerics.net/ meta numerics] implements some statistical tests (e.g. ANOVA, Mann-Whitney U, ...) but is licensed under the MS PL which is incompatible with the GPL.
* [http://code.google.com/p/accord/ Accord.NET] / [http://www.aforgenet.com/ AForge.NET]: From the Accord.NET website: ''"Accord.NET is a framework for scientific computing in .NET. The framework builds upon AForge.NET, an also popular framework for image processing, supplying new tools and libraries. Those libraries encompass a wide range of scientific computing applications, such as statistical data processing, machine learning, pattern recognition, including but not limited to, computer vision and computer audition. The framework offers a large number of probability distributions, hypothesis tests, kernel functions and support for most popular performance measurements techniques." '' These libraries actually provide a lot of stuff which we don't need. Gladly the math/statistics parts can be extracted very easily (they are own assemblies). Accord.NET implements OneWayAnova, TwoWayAnova(1,2,3), T-test, Mann-Whitney Wilcoxon, Kolmogorov-Smirnov and a lot more. And they are both licensed under LGPL so we don't have a problem with licensing. 

== Links ==
 * [http://moses.us.es/statservice/ STATService] for comparison of metaheuristic results
 * [http://en.wikipedia.org/wiki/Post-hoc_analysis Post-hoc analysis]
 * [http://vassarstats.net/textbook/ch15a.html Friedman test example]
 * [http://seriousstats.wordpress.com/2012/02/14/friedman/ Criticism of friedman test]
 * [http://www.wiwi.uni-muenster.de/ioeb/en/organisation/pfaff/stat_overview_table.html Statistical Tests Overview]

== Significant ==
[[Image(significant.png)]]

[[Image(http://imgs.xkcd.com/comics/null_hypothesis.png)]]