Opened 4 years ago

Closed 7 months ago

#2561 closed defect (done)

TimeLimitRun does not work with the Hive Slave

Reported by: ascheibe Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Optimization Version: trunk
Keywords: merged Cc:

Description (last modified by ascheibe)

TimeLimitRun's check-pointing triggers the pause event. The slave registers on this event and then sends the task back to the server.

Change History (15)

comment:1 Changed 4 years ago by ascheibe

  • Description modified (diff)

comment:2 Changed 4 years ago by ascheibe

  • Owner changed from ascheibe to jkarder
  • Status changed from new to assigned

comment:3 Changed 4 years ago by jkarder

  • Milestone changed from HeuristicLab 3.3.14 to HeuristicLab 3.3.15

comment:4 Changed 3 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.16

comment:5 Changed 2 years ago by abeham

  • Version 3.3.13 deleted

comment:6 Changed 23 months ago by pfleck

There is an additional issue with the TimeLimitRun and HiveSlaves concerning the Pause/Stop of the underlying algorithm.

When the time limit is reached and the TimeLimitRun is supposed to stop, a Pause event is fired right before the expected Stop event. On a local machine, this is barely noticeable and not a big problem in general. On a Hive slave, however, the first Pause event leads to the Slave pausing the task and resending it to the server, where the server reschedules it for another slave (because the task is paused). Thus, the TimeLimitRun never stops running on Hive.

comment:7 Changed 11 months ago by gkronber

This problem has become more severe.

Currently, the TimeLimitRun crashes the hive worker.

comment:8 Changed 11 months ago by abeham

I will take a look into it. I will try to change that the Pause events of the underlying Algorithm do not cause a Pause in the TimeLimitRun when only a snapshot is to happen. Then, hopefully the worker will not be aware of the Pause event.

comment:9 Changed 11 months ago by abeham

r16651: Decoupled execution state of timelimitrun and its embedded algorithm

The TimeLimitRun should now not Pause anymore when it is creating a Snapshot. Please try if this works for you.

comment:10 Changed 11 months ago by abeham

  • Component changed from Hive.Client to Optimization
  • Owner changed from jkarder to gkronber
  • Status changed from assigned to reviewing
  • Version set to trunk

comment:11 Changed 10 months ago by gkronber

  • Status changed from reviewing to readytorelease

I tested the changes locally and on hive using an experiment with batch runs of TimeLimitRun with GA - TSP. It seems to work now.

Reviewed r16651.

comment:12 Changed 10 months ago by gkronber

Must be merged with/after persistence.

comment:13 Changed 9 months ago by abeham

  • Keywords depends-2520 added

comment:14 Changed 7 months ago by abeham

  • Keywords merged added; depends-2520 removed

r17115: merged to stable (16651)

comment:15 Changed 7 months ago by abeham

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.