Opened 3 years ago

Last modified 2 months ago

#2561 readytorelease defect

TimeLimitRun does not work with the Hive Slave

Reported by: ascheibe Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Optimization Version: trunk
Keywords: depends-2520 Cc:

Description (last modified by ascheibe)

TimeLimitRun's check-pointing triggers the pause event. The slave registers on this event and then sends the task back to the server.

Change History (13)

comment:1 Changed 3 years ago by ascheibe

  • Description modified (diff)

comment:2 Changed 3 years ago by ascheibe

  • Owner changed from ascheibe to jkarder
  • Status changed from new to assigned

comment:3 Changed 3 years ago by jkarder

  • Milestone changed from HeuristicLab 3.3.14 to HeuristicLab 3.3.15

comment:4 Changed 2 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.16

comment:5 Changed 22 months ago by abeham

  • Version 3.3.13 deleted

comment:6 Changed 16 months ago by pfleck

There is an additional issue with the TimeLimitRun and HiveSlaves concerning the Pause/Stop of the underlying algorithm.

When the time limit is reached and the TimeLimitRun is supposed to stop, a Pause event is fired right before the expected Stop event. On a local machine, this is barely noticeable and not a big problem in general. On a Hive slave, however, the first Pause event leads to the Slave pausing the task and resending it to the server, where the server reschedules it for another slave (because the task is paused). Thus, the TimeLimitRun never stops running on Hive.

comment:7 Changed 4 months ago by gkronber

This problem has become more severe.

Currently, the TimeLimitRun crashes the hive worker.

comment:8 Changed 4 months ago by abeham

I will take a look into it. I will try to change that the Pause events of the underlying Algorithm do not cause a Pause in the TimeLimitRun when only a snapshot is to happen. Then, hopefully the worker will not be aware of the Pause event.

comment:9 Changed 4 months ago by abeham

r16651: Decoupled execution state of timelimitrun and its embedded algorithm

The TimeLimitRun should now not Pause anymore when it is creating a Snapshot. Please try if this works for you.

comment:10 Changed 4 months ago by abeham

  • Component changed from Hive.Client to Optimization
  • Owner changed from jkarder to gkronber
  • Status changed from assigned to reviewing
  • Version set to trunk

comment:11 Changed 3 months ago by gkronber

  • Status changed from reviewing to readytorelease

I tested the changes locally and on hive using an experiment with batch runs of TimeLimitRun with GA - TSP. It seems to work now.

Reviewed r16651.

comment:12 Changed 3 months ago by gkronber

Must be merged with/after persistence.

comment:13 Changed 2 months ago by abeham

  • Keywords depends-2520 added
Note: See TracTickets for help on using tickets.