Opened 9 years ago
Closed 8 years ago
#2577 closed defect (worksforme)
Platform-dependent CLR behavior affects experiment reproducibility
Reported by: | bburlacu | Owned by: | gkronber |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | General | Version: | 3.3.13 |
Keywords: | Cc: |
Description
This is an issue I encountered while attempting to reproduce a some Hive runs. Despite setting the same seed and using the same experiment file, my local results were always different than the ones obtained in the hive. Results were reproducible locally but not across different platforms.
The issue is that the CLR (specifically the JIT compiler) is optimizing things differently depending on architecture. Some related information about this can be found on stack overflow: http://stackoverflow.com/questions/14864238/coercing-floating-point-to-be-deterministic-in-net
This behavior can be reproduced by running HL from the attached zip file using the provided .hl file (in the folder "test file" inside the bin directory) on different platforms (eg., intel haswell vs intel ivy bridge). The binaries were compiled for Any CPU in Release mode. Relevant to this issue are the test function used (Friedman-2) and the symbolic expression tree grammar (TypeCoherent - arithmetic + log/exp).
I tested the same HL binary and the same experiment file on four different platforms described here: http://ark.intel.com/compare/88967,84695,70847,75122
- work laptop: intel ivy bridge, supports the AVX instruction set extensions
- personal desktop computer: intel haswell, supports SSE4.1/4.2, AVX 2.0 instruction set extensions
- another two computers one broadwell, one skylake, both supporting SSE4.1/4.2, AVX 2.0
The results were quite different:
- Ivy Bridge (AVX instruction set):
- 38 generations
- best quality 0.87590448125046694
- Haswell, Broadwell, Skylake (SSE4.1/4.2, AVX 2.0):
- 45 generations
- best quality 0.89567687300217524
Therefore, the issue lies in the compiler making use of the different capabilities of the CPU, in this case AVX versus SSE4.1/4.2 and AVX 2.0. To confirm this, I also used the "Intel® Software Development Emulator" (https://software.intel.com/en-us/articles/intel-software-development-emulator) to emulate Ivy Bridge platform.
HeuristicLab can be run in an emulated environment using the intel SDE (warning, it will be very slow): C:\Users\Bogdan\Desktop\hl-evotrack>sde -ivb -- "HeuristicLab 3.3.exe"
Unsurprisingly, the results were identical with the ones above:
- 38 generations
- best quality 0.87590448125046694
This evidence indicates that (at least under certain conditions) Hive runs are not reproducible locally. So far, the only way I could think of to avoid this problem it to use a simple grammar (without log/exp since the simple arithmetic operations appear not to be affected).
Attachments (1)
Change History (5)
Changed 9 years ago by bburlacu
comment:1 Changed 9 years ago by bburlacu
- Owner set to mkommend
- Status changed from new to assigned
comment:2 Changed 9 years ago by gkronber
- Component changed from ### Undefined ### to General
- Owner changed from mkommend to architects
Thanks for the thorough analysis.
In my point of view this means we just have to accept that algorithms using floating point ops produce different results in different environments (hardware & software). This not only affects symbolic regression but potentially all our algorithms / problems.
Still, I believe we should strive for reproduceability within the same environment (hardware / software). Algorithm results produced on different hardware can only be compared statistically.
comment:3 Changed 8 years ago by mkommend
- Owner changed from architects to gkronber
comment:4 Changed 8 years ago by gkronber
- Milestone HeuristicLab 3.3.14 deleted
- Resolution set to worksforme
- Status changed from assigned to closed
Discussed. Thanks for the detailed analysis.
HL binaries and test file