Context Navigation

#3058 accepted defect

Speed of symbolic regression evaluators

Reported by:	bburlacu	Owned by:	bburlacu
Priority:	medium	Milestone:
Component:	Problems.DataAnalysis.Symbolic	Version:
Keywords:		Cc:

Description (last modified by bburlacu)

There is a big difference between the speed of the tree interpreter and the speed of the (R-Squared) evaluator:

For an arithmetic grammar the new native interpreter achieves a speed of approx. 3 billion nodes/second.

By contrast, the Pearson's R-Squared evaluator is more than 20 times slower. Using the values returned by the interpreter, it only manages a speed of 0.14 billion nodes/second when computing the R2 value. This is an obvious bottleneck.

Of course, the evaluator performs some extra work:

Generating random subsets of rows from the training partition

Scaling the values and bounding them to the estimation limits

Checking for error conditions in the calculators (linear scaling calculator, pearson'r correlation calculator)

I think there are some inconsistencies and performance bottlenecks in the SymbolicDataAnalysisEvaluator design:

Spurious usage of Linq and Enumerators

Thread static cache for estimated values (not clear why this is needed since one tree should be evaluated just once)

In 99% of use cases the GenerateRows method just returns an IEnumerable<int> between TrainingPartition.Start and TrainingPartition.End, thus only introducing overhead

The underlying statistical calculators (mean-variance calculator, covariance, correlation, linear scaling) perform the same checks for the added values (IsNan, IsInfinity). this actually adds up to a noticeable amount of overhead since eg., the correlation calculator calls three other calculators in its Add method, performing the same checks

Suggested workarounds:

Use arrays: this will slightly increase memory pressure but we are generating those values anyway (generate rows, get tree values) so it would speed things up overall

As an example, adding a GetSymbolicExpressionTreeValues(ISymbolicExpressionTree, IDataset, int[]) to the ISymbolicDataAnalysisExpressionTreeInterpreter interface would simplify usage and not break existing contracts

Only check values added to the calculator once: I would provide an AddUnchecked method to the online calculators for cases when the caller knows what they are doing

Naively implementing these suggestions results in 2-4x improvement in evaluation speed:

Linear scaling  Evaluator (old)         Evaluator (new) + Checked  Evaluator (new) + Unchecked  Improvement
TRUE            1.383E+008 ± 4.126E+00  2.776E+008 ± 2.538E+00     3.990E+008 ± 2.347E+00       189%
FALSE           2.479E+008 ± 1.095E+00  3.831E+008 ± 2.659E+00     4.881E+008 ± 3.237E+00       97%

On actual problems the improvement is still significant (depending on problem size):

Problem       NMSE (train)  NMSE (test)  Elapsed (old)  Elapsed (new)  Runtime improvement
Airfoil       0.2189        0.2564       24.9           20.5           21.5%
Breiman-I     0.1146        0.1219       78.0           54.2           43.7%
Chemical-I    0.2043        0.3243       22.2           18.4           20.8%
Concrete      0.1513        0.5952       19.6           16.9           16.1%
Friedman-I    0.1384        0.1385       90.9           65.6           38.7%
Friedman-II   0.0412        0.0424       98.9           66.8           48.1%
GP-Challenge  0.0793        0.0794       86.5           60.7           42.5%
Pagie-1       0.0027        0.0235       24.5           20.8           17.8%
Poly-10       0.1698        0.1929       14.7           13.8           6.8%

A proper redesign would likely yield even better results.

Attachments (1)

measure_speed.hl (1.9 KB) - added by bburlacu 5 years ago.: Script for measuring interpreter and evaluator speed

Download all attachments as: .zip

Change History (2)

comment:1 Changed 5 years ago by bburlacu

Description modified (diff)
Owner set to bburlacu
Status changed from new to accepted

Changed 5 years ago by bburlacu

Attachment measure_speed.hl added

Script for measuring interpreter and evaluator speed

Note: See TracTickets for help on using tickets.

Download in other formats: