Opened 11 years ago
Closed 10 years ago
#2153 closed defect (done)
Add a timeout for stopping a Hive task
Reported by: | ascheibe | Owned by: | ascheibe |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.10 |
Component: | Hive.Slave | Version: | 3.3.9 |
Keywords: | Cc: |
Description (last modified by ascheibe)
Currently Stop() is called on the task. If it does not terminate properly, it waits indefinitely. There should be a timeout. See Executor.cs, line 136. There is already a configurable timeout that is used when starting tasks (ExecutorSemTimeouts) that can be reused.
Change History (11)
comment:1 Changed 10 years ago by ascheibe
- Description modified (diff)
comment:2 Changed 10 years ago by ascheibe
comment:3 Changed 10 years ago by ascheibe
- Status changed from new to accepted
I removed ExceptionOccured because it has more or less the same semantic as TaskFailed. The plan was that ExceptionOccured was raised when something went wrong while executing a task and TaskFailed when something with starting/stopping/pausing went wrong. Therefore the handling of both was similar and code was duplicated. The only difference was that ExceptionOccured rescheduled the task while TaskFailed prevented the task from getting run again. But as #2154 mentions that is not desired anyway.
comment:4 Changed 10 years ago by ascheibe
- Owner changed from ascheibe to mkommend
- Status changed from accepted to reviewing
comment:5 Changed 10 years ago by mkommend
- Owner changed from mkommend to ascheibe
- Status changed from reviewing to assigned
R11082 looks OK.
However, I tried to test the changes by uploading a job which immediately throws an exception and while the execution time is not increased anymore (0.1s) the job is still calculating (after ~ 10 minutes).
comment:6 Changed 10 years ago by mkommend
Btw, I have shared the hanging hive job with you.
comment:7 Changed 10 years ago by ascheibe
- Owner changed from ascheibe to mkommend
- Status changed from assigned to reviewing
r11113 fixed assembly file version lookup to also work in sandboxes. FileVersionInfo.GetVersionInfo(..) needs LinkDemand which we don't allow in a Hive sandbox and therefore throws an exceptions. This leads to tasks that get rescheduled or just stay paused on the slave and never get sent back to the server.
comment:8 Changed 10 years ago by ascheibe
I have installed the new version of the slave on blade01, you can use it for testing.
comment:9 Changed 10 years ago by ascheibe
r11117 changed exception type
comment:10 Changed 10 years ago by mkommend
- Owner changed from mkommend to ascheibe
- Status changed from reviewing to readytorelease
comment:11 Changed 10 years ago by ascheibe
- Resolution set to done
- Status changed from readytorelease to closed
r11082