Opened 8 months ago

Last modified 2 months ago

#2958 readytorelease enhancement

Vectorized/batch-mode interpreter for symbolic expression trees

Reported by: bburlacu Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Problems.DataAnalysis.Symbolic Version: trunk
Keywords: Cc:

Description (last modified by bburlacu)

This ticket explores the possibility of employing batching and vectorisation techniques (ie. using dedicated datatypes from System.Numerics) to speed up the interpretation of symbolic expression trees.

Batching consists in allocating a small buffer for each instruction and performing operations on the whole buffer (instead of individual values for each row in the dataset).

Vectorisation additionally involves using SIMD (Single Instruction Multiple Data) CPU instructions to speed up batch processing.

Managed (C#) interpreter

Batch processing using the Vector<double> class in System.Numerics allows us to achieve a 2-3x speed improvement compared to the standard linear interpreter.

Native interpreter

A tree interpreter in native code (C++) can offer a significant speed advantage due to more mature backends (msvc, gcc) and features like auto-vectorization and loop unrolling.

Preliminary results show 5-10x speed improvement compared to the linear tree interpreter. We should also investigate the potential benefit of integrating fast math libraries such as vdt (vectorized math]) to increase computation speed.

This functionality should be implemented as an external plugin.

Attachments (1)

PerformanceChart.xlsx (14.6 KB) - added by bburlacu 8 months ago.
Preliminary results

Download all attachments as: .zip

Change History (25)

Changed 8 months ago by bburlacu

Preliminary results

comment:1 Changed 8 months ago by bburlacu

  • Status changed from new to accepted

comment:2 Changed 8 months ago by bburlacu

r16266: Add native interpreter dll wrapper as external lib.

r16269: Add C++ source code

r16274: Update dll files and C++ source code to the latest version.

r16276: Add SymbolicDataAnalysisExpressionTreeNativeInterpreter which calls into the native implementation.

r16277: SymbolicDataAnalysisExpressionTreeNativeInterpreter: add EvaluatedSolutions as parameter, similar to the other interpreters.

Last edited 7 months ago by bburlacu (previous) (diff)

comment:3 Changed 7 months ago by bburlacu

  • Owner changed from bburlacu to gkronber
  • Status changed from accepted to reviewing

comment:4 Changed 7 months ago by bburlacu

  • Description modified (diff)
  • Summary changed from Native interpreter for symbolic expression trees to Vectorized/batch-mode interpreter for symbolic expression trees

comment:5 Changed 7 months ago by bburlacu

r16285: Add vectorized SymbolicDataAnalysisExpressionTreeBatchInterpreter and update project config (Nuget package System.Numerics).

r16286: Forgot to commit changes to project file

r16287: Keep the SymbolicDataAnalysisExpressionTreeBatchInterpreter, but remove vectorization.

r16289: Add plugin dependency to native interpreter plugin.

Last edited 7 months ago by bburlacu (previous) (diff)

comment:6 Changed 7 months ago by bburlacu

r16293: Support additional symbols in the SymbolicDataAnalysisExpressionTreeBatchInterpreter

comment:7 Changed 7 months ago by bburlacu

r16296: SymbolicDataAnalysisExpressionTreeBatchInterpreter: simplify Compile, add cache for variable values (helps a lot with performance).

r16297: Very minor refactor.

Last edited 7 months ago by bburlacu (previous) (diff)

comment:8 Changed 7 months ago by bburlacu

r16298: Add batch interpreter performance unit tests (for arithmetic and typecoherent grammar).

comment:9 Changed 7 months ago by bburlacu

r16333: Native interpreter dlls: statically link against the Visual C++ runtime

r16334: Add support for sqrt in the interpreter and update dlls .

Last edited 7 months ago by bburlacu (previous) (diff)

comment:10 Changed 6 months ago by gkronber

reviewed the code and made some changes in the branch for #2915. Will merge back later.

comment:11 Changed 6 months ago by gkronber

In r16356 I merged back changes to the native interpreter from the #2915 branch.

I tested a lot and found that the native interpreter produces exactly the same results as the managed interpreters except when using sqrt() or abs(). Do you have any idea why this might happen?

Last edited 6 months ago by gkronber (previous) (diff)

comment:12 Changed 6 months ago by gkronber

  • Owner changed from gkronber to bburlacu

comment:13 Changed 6 months ago by gkronber

  • Status changed from reviewing to assigned

The BatchInterpreter assumes that GetSymbolicExpressionTreeValues() is always called with the same dataset. On the first call it caches the supplied dataset and on later calls it just takes the values from the cache.

The API allows to call interpreters with different datasets so the supplied dataset must be checked against the cache on each call.

I just fell into this trap and only recognized the problem because I was skeptical of the results. Looking only at my code I would have never found that the interpreter actually ignores the dataset that I set as a parameter.

Please fix!

Probably this is true for the native interpreter as well.

comment:14 Changed 6 months ago by bburlacu

r16378: Batch and Native interpreter: keep a cached reference to the dataset so we can detect when it changes.

comment:15 Changed 6 months ago by bburlacu

r16379: NativeInterpreter: avoid memory leak (free pinned array handles when the cache changes)

comment:16 Changed 6 months ago by gkronber

  • Owner changed from bburlacu to gkronber
  • Status changed from assigned to reviewing

comment:17 Changed 5 months ago by gkronber

  • Version changed from 3.4 to trunk

comment:18 Changed 5 months ago by abeham

r16542: changed reference to PluginInfrastructure from file to project

The project build order wasn't right due to the missing dependency on the project. NativeInterpreter could be built before PluginInfrastucture since the reference wasn't recorded as a project reference.

comment:19 Changed 2 months ago by gkronber

The new interpreters do not support factor variables and several function symbols (average, ...).

Last edited 2 months ago by gkronber (previous) (diff)

comment:20 Changed 2 months ago by gkronber

Reviewed r16378, r16379, r16542 and tested the new interpreters.

comment:21 Changed 2 months ago by gkronber

  • Status changed from reviewing to assigned

There should be an exception when the evaluator does not support a symbol.

comment:22 Changed 2 months ago by gkronber

r16762: added checks and exceptions if native and batch interpreters encounter an unsupported symbol.

comment:23 Changed 2 months ago by gkronber

  • Status changed from assigned to readytorelease

comment:24 Changed 2 months ago by gkronber

r16768: fixed supported operations in BatchInterpreter (fix failing unit test)

Note: See TracTickets for help on using tickets.