Opened 8 weeks ago

Last modified 8 weeks ago

#3090 accepted enhancement

Modern implementation for Pearson R correlation and covariance

Reported by: bburlacu Owned by: bburlacu
Priority: medium Milestone: HeuristicLab 3.3.x Backlog
Component: Problems.DataAnalysis Version: trunk
Keywords: Cc:

Description

Our OnlinePearsonsRCalculator calculates the Pearson's R correlation coefficient from variance and covariance calculated by the OnlineMeanAndVarianceCalculator and the OnlineCovarianceCalculator using algorithms by Knuth (variance) and Welford (covariance).

Implementation-wise, OnlinePearsonsRCalculator class actually encapsulates three other calculator objects. With every pair of values, each of the three calculators performs basically the same checks on the input, leading to some inefficiency and more complicated and error-prone code.

We could instead be using an extension of Welford's algorithm by Schubert et al. [1] which computes everything (means, variances, covariances) in one pass, thus providing a simpler, numerically-stable, more performant implementation.

[1] https://dl.acm.org/doi/10.1145/3221269.3223036

Change History (2)

comment:1 Changed 8 weeks ago by bburlacu

  • Status changed from new to accepted

r17787: Implement extension to Welford's algorithm by Schubert et al in the OnlinePearsonsRCalculator.

comment:2 Changed 8 weeks ago by bburlacu

r17788: Revert r17787 due to numerical precision issues.

Note: See TracTickets for help on using tickets.