Free cookie consent management tool by TermsFeed Policy Generator

Opened 8 years ago

Closed 8 years ago

#2577 closed defect (worksforme)

Platform-dependent CLR behavior affects experiment reproducibility

Reported by: bburlacu Owned by: gkronber
Priority: high Milestone:
Component: General Version: 3.3.13
Keywords: Cc:

Description

This is an issue I encountered while attempting to reproduce a some Hive runs. Despite setting the same seed and using the same experiment file, my local results were always different than the ones obtained in the hive. Results were reproducible locally but not across different platforms.

The issue is that the CLR (specifically the JIT compiler) is optimizing things differently depending on architecture. Some related information about this can be found on stack overflow: http://stackoverflow.com/questions/14864238/coercing-floating-point-to-be-deterministic-in-net

This behavior can be reproduced by running HL from the attached zip file using the provided .hl file (in the folder "test file" inside the bin directory) on different platforms (eg., intel haswell vs intel ivy bridge). The binaries were compiled for Any CPU in Release mode. Relevant to this issue are the test function used (Friedman-2) and the symbolic expression tree grammar (TypeCoherent - arithmetic + log/exp).

I tested the same HL binary and the same experiment file on four different platforms described here: http://ark.intel.com/compare/88967,84695,70847,75122

  • work laptop: intel ivy bridge, supports the AVX instruction set extensions
  • personal desktop computer: intel haswell, supports SSE4.1/4.2, AVX 2.0 instruction set extensions
  • another two computers one broadwell, one skylake, both supporting SSE4.1/4.2, AVX 2.0

The results were quite different:

  • Ivy Bridge (AVX instruction set):
    • 38 generations
    • best quality 0.87590448125046694
  • Haswell, Broadwell, Skylake (SSE4.1/4.2, AVX 2.0):
    • 45 generations
    • best quality 0.89567687300217524

Therefore, the issue lies in the compiler making use of the different capabilities of the CPU, in this case AVX versus SSE4.1/4.2 and AVX 2.0. To confirm this, I also used the "Intel® Software Development Emulator" (https://software.intel.com/en-us/articles/intel-software-development-emulator) to emulate Ivy Bridge platform.

HeuristicLab can be run in an emulated environment using the intel SDE (warning, it will be very slow): C:\Users\Bogdan\Desktop\hl-evotrack>sde -ivb -- "HeuristicLab 3.3.exe"

Unsurprisingly, the results were identical with the ones above:

  • 38 generations
  • best quality 0.87590448125046694

This evidence indicates that (at least under certain conditions) Hive runs are not reproducible locally. So far, the only way I could think of to avoid this problem it to use a simple grammar (without log/exp since the simple arithmetic operations appear not to be affected).

Attachments (1)

hl-evotrack.zip (48.4 MB) - added by bburlacu 8 years ago.
HL binaries and test file

Change History (5)

Changed 8 years ago by bburlacu

HL binaries and test file

comment:1 Changed 8 years ago by bburlacu

  • Owner set to mkommend
  • Status changed from new to assigned

comment:2 Changed 8 years ago by gkronber

  • Component changed from ### Undefined ### to General
  • Owner changed from mkommend to architects

Thanks for the thorough analysis.

In my point of view this means we just have to accept that algorithms using floating point ops produce different results in different environments (hardware & software). This not only affects symbolic regression but potentially all our algorithms / problems.

Still, I believe we should strive for reproduceability within the same environment (hardware / software). Algorithm results produced on different hardware can only be compared statistically.

comment:3 Changed 8 years ago by mkommend

  • Owner changed from architects to gkronber

comment:4 Changed 8 years ago by gkronber

  • Milestone HeuristicLab 3.3.14 deleted
  • Resolution set to worksforme
  • Status changed from assigned to closed

Discussed. Thanks for the detailed analysis.

Note: See TracTickets for help on using tickets.