Context Navigation

#2577 closed defect (worksforme)

Platform-dependent CLR behavior affects experiment reproducibility

Reported by:	bburlacu	Owned by:	gkronber
Priority:	high	Milestone:
Component:	General	Version:	3.3.13
Keywords:		Cc:

Description ¶

This is an issue I encountered while attempting to reproduce a some Hive runs. Despite setting the same seed and using the same experiment file, my local results were always different than the ones obtained in the hive. Results were reproducible locally but not across different platforms.

The issue is that the CLR (specifically the JIT compiler) is optimizing things differently depending on architecture. Some related information about this can be found on stack overflow: http://stackoverflow.com/questions/14864238/coercing-floating-point-to-be-deterministic-in-net

This behavior can be reproduced by running HL from the attached zip file using the provided .hl file (in the folder "test file" inside the bin directory) on different platforms (eg., intel haswell vs intel ivy bridge). The binaries were compiled for Any CPU in Release mode. Relevant to this issue are the test function used (Friedman-2) and the symbolic expression tree grammar (TypeCoherent - arithmetic + log/exp).

I tested the same HL binary and the same experiment file on four different platforms described here: http://ark.intel.com/compare/88967,84695,70847,75122

work laptop: intel ivy bridge, supports the AVX instruction set extensions
personal desktop computer: intel haswell, supports SSE4.1/4.2, AVX 2.0 instruction set extensions
another two computers one broadwell, one skylake, both supporting SSE4.1/4.2, AVX 2.0

The results were quite different:

Ivy Bridge (AVX instruction set):
- 38 generations
- best quality 0.87590448125046694
Haswell, Broadwell, Skylake (SSE4.1/4.2, AVX 2.0):
- 45 generations
- best quality 0.89567687300217524

Therefore, the issue lies in the compiler making use of the different capabilities of the CPU, in this case AVX versus SSE4.1/4.2 and AVX 2.0. To confirm this, I also used the "Intel® Software Development Emulator" (https://software.intel.com/en-us/articles/intel-software-development-emulator) to emulate Ivy Bridge platform.

HeuristicLab can be run in an emulated environment using the intel SDE (warning, it will be very slow): C:\Users\Bogdan\Desktop\hl-evotrack>sde -ivb -- "HeuristicLab 3.3.exe"

Unsurprisingly, the results were identical with the ones above:

38 generations
best quality 0.87590448125046694

This evidence indicates that (at least under certain conditions) Hive runs are not reproducible locally. So far, the only way I could think of to avoid this problem it to use a simple grammar (without log/exp since the simple arithmetic operations appear not to be affected).

Attachments (1)

hl-evotrack.zip (48.4 MB) - added by bburlacu 9 years ago.: HL binaries and test file

Change History (5)

Changed 9 years ago by bburlacu

Attachment hl-evotrack.zip added

HL binaries and test file

comment:1 Changed 9 years ago by bburlacu

Owner set to mkommend
Status changed from new to assigned

comment:2 Changed 9 years ago by gkronber

Component changed from ### Undefined ### to General
Owner changed from mkommend to architects

Thanks for the thorough analysis.

In my point of view this means we just have to accept that algorithms using floating point ops produce different results in different environments (hardware & software). This not only affects symbolic regression but potentially all our algorithms / problems.

Still, I believe we should strive for reproduceability within the same environment (hardware / software). Algorithm results produced on different hardware can only be compared statistically.

comment:3 Changed 9 years ago by mkommend

Owner changed from architects to gkronber

comment:4 Changed 9 years ago by gkronber

Milestone HeuristicLab 3.3.14 deleted
Resolution set to worksforme
Status changed from assigned to closed

Discussed. Thanks for the detailed analysis.

Note: See TracTickets for help on using tickets.

Download in other formats: