Opened 9 years ago
Closed 5 years ago
#2561 closed defect (done)
TimeLimitRun does not work with the Hive Slave
Reported by: | ascheibe | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.16 |
Component: | Optimization | Version: | trunk |
Keywords: | merged | Cc: |
Description (last modified by ascheibe)
TimeLimitRun's check-pointing triggers the pause event. The slave registers on this event and then sends the task back to the server.
Change History (15)
comment:1 Changed 9 years ago by ascheibe
- Description modified (diff)
comment:2 Changed 8 years ago by ascheibe
- Owner changed from ascheibe to jkarder
- Status changed from new to assigned
comment:3 Changed 8 years ago by jkarder
- Milestone changed from HeuristicLab 3.3.14 to HeuristicLab 3.3.15
comment:4 Changed 7 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.15 to HeuristicLab 3.3.16
comment:5 Changed 7 years ago by abeham
- Version 3.3.13 deleted
comment:6 Changed 7 years ago by pfleck
comment:7 Changed 6 years ago by gkronber
This problem has become more severe.
Currently, the TimeLimitRun crashes the hive worker.
comment:8 Changed 6 years ago by abeham
I will take a look into it. I will try to change that the Pause events of the underlying Algorithm do not cause a Pause in the TimeLimitRun when only a snapshot is to happen. Then, hopefully the worker will not be aware of the Pause event.
comment:9 Changed 6 years ago by abeham
r16651: Decoupled execution state of timelimitrun and its embedded algorithm
The TimeLimitRun should now not Pause anymore when it is creating a Snapshot. Please try if this works for you.
comment:10 Changed 6 years ago by abeham
- Component changed from Hive.Client to Optimization
- Owner changed from jkarder to gkronber
- Status changed from assigned to reviewing
- Version set to trunk
comment:11 Changed 6 years ago by gkronber
- Status changed from reviewing to readytorelease
I tested the changes locally and on hive using an experiment with batch runs of TimeLimitRun with GA - TSP. It seems to work now.
Reviewed r16651.
comment:12 Changed 6 years ago by gkronber
Must be merged with/after persistence.
comment:13 Changed 6 years ago by abeham
- Keywords depends-2520 added
comment:14 Changed 5 years ago by abeham
- Keywords merged added; depends-2520 removed
r17115: merged to stable (16651)
comment:15 Changed 5 years ago by abeham
- Resolution set to done
- Status changed from readytorelease to closed
There is an additional issue with the TimeLimitRun and HiveSlaves concerning the Pause/Stop of the underlying algorithm.
When the time limit is reached and the TimeLimitRun is supposed to stop, a Pause event is fired right before the expected Stop event. On a local machine, this is barely noticeable and not a big problem in general. On a Hive slave, however, the first Pause event leads to the Slave pausing the task and resending it to the server, where the server reschedules it for another slave (because the task is paused). Thus, the TimeLimitRun never stops running on Hive.