Opened 12 years ago
Closed 12 years ago
#2052 closed defect (done)
BatchRun only executes one run of a CrossValidation
Reported by: | sforsten | Owned by: | mkommend |
---|---|---|---|
Priority: | highest | Milestone: | HeuristicLab 3.3.9 |
Component: | Algorithms.DataAnalysis | Version: | 3.3.8 |
Keywords: | Cc: |
Description (last modified by sforsten)
If CrossValidation is set as optimizer in a BatchRun, only the first run is executed. The BatchRun remains in the ExecutionState.Started and even if the BatchRun is stopped and prepared again, no additional run can be executed anymore.
The problem is that CrossValidation uses a lock in several methods, which are accessed through events by multiple cloned algorithms at the same time. The last algorithm which accesses ClonedAlgorithm_Stopped calls the method OnStopped which informs BatchRun that CrossValidation has stopped. Therefore, BatchRun prepares and starts CrossValidation again, which leads to the call of ClonedAlgorithm_Started by the algorithms started by CrossValidation.
The lock used in CrossValidation is acquired in the method ClonedAlgorithm_Stopped and the call hierarchy leads through BatchRun to the method ClonedAlgorithm_Started where the lock needs to be acquired again. This leads to a deadlock.
Change History (13)
comment:1 Changed 12 years ago by sforsten
- Description modified (diff)
comment:2 Changed 12 years ago by sforsten
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.9
- Status changed from new to accepted
- Version changed from 3.3.9 to 3.3.8
comment:3 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from accepted to reviewing
comment:4 follow-up: ↓ 5 Changed 12 years ago by gkronber
- Owner changed from mkommend to gkronber
- Status changed from reviewing to assigned
The lock cannot simply be removed as it is possible that several threads call ClonedAlgorithm_Started in parallel. I'll try to fix the problem for the next release.
comment:5 in reply to: ↑ 4 Changed 12 years ago by mkommend
Replying to gkronber:
The lock cannot simply be removed as it is possible that several threads call ClonedAlgorithm_Started in parallel. I'll try to fix the problem for the next release.
Yes you are right! sforsten and I have overlooked that the result collection is not thread-safe. An easy fix would be to introduce another lock object for access to the result collection, as the existing locker cannot be used due to deadlocks.
comment:6 Changed 12 years ago by mkommend
r9504: Added lock object to synchronize access to the results collection.
comment:7 Changed 12 years ago by mkommend
- Status changed from assigned to reviewing
comment:8 Changed 12 years ago by gkronber
There are multiple usages of the results collection within the CV class and also potentially from other classes.
comment:9 Changed 12 years ago by gkronber
r9517 minor adaptations in cross-validation
comment:10 Changed 12 years ago by gkronber
- Owner changed from gkronber to mkommend
comment:11 Changed 12 years ago by mkommend
Reviewed r9517.
comment:12 Changed 12 years ago by mkommend
- Status changed from reviewing to readytorelease
comment:13 Changed 12 years ago by mkommend
- Resolution set to done
- Status changed from readytorelease to closed
r9493: the lock in ClonedAlgorithm_Started has been removed as well as all code related to the bool variable startPending. startPending is never really used. In the methods it is used, there is no chance that the if statement would evaluate to anything else than true.
One problem still remains. If the stop button of the BatchRun is pressed when its optimizer is going to be started for the next repetition, the BatchRun will be set to stopped, while the optimizer will start another run. When the optimizer is finished the BatchRun will be set to paused.