Opened 4 years ago

Closed 4 years ago

#2052 closed defect (done)

BatchRun only executes one run of a CrossValidation

Reported by: sforsten Owned by: mkommend
Priority: highest Milestone: HeuristicLab 3.3.9
Component: Algorithms.DataAnalysis Version: 3.3.8
Keywords: Cc:

Description (last modified by sforsten)

If CrossValidation is set as optimizer in a BatchRun, only the first run is executed. The BatchRun remains in the ExecutionState.Started and even if the BatchRun is stopped and prepared again, no additional run can be executed anymore.

The problem is that CrossValidation uses a lock in several methods, which are accessed through events by multiple cloned algorithms at the same time. The last algorithm which accesses ClonedAlgorithm_Stopped calls the method OnStopped which informs BatchRun that CrossValidation has stopped. Therefore, BatchRun prepares and starts CrossValidation again, which leads to the call of ClonedAlgorithm_Started by the algorithms started by CrossValidation.

The lock used in CrossValidation is acquired in the method ClonedAlgorithm_Stopped and the call hierarchy leads through BatchRun to the method ClonedAlgorithm_Started where the lock needs to be acquired again. This leads to a deadlock.

Change History (13)

comment:1 Changed 4 years ago by sforsten

  • Description modified (diff)

comment:2 Changed 4 years ago by sforsten

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.9
  • Status changed from new to accepted
  • Version changed from 3.3.9 to 3.3.8

r9493: the lock in ClonedAlgorithm_Started has been removed as well as all code related to the bool variable startPending. startPending is never really used. In the methods it is used, there is no chance that the if statement would evaluate to anything else than true.

One problem still remains. If the stop button of the BatchRun is pressed when its optimizer is going to be started for the next repetition, the BatchRun will be set to stopped, while the optimizer will start another run. When the optimizer is finished the BatchRun will be set to paused.

comment:3 Changed 4 years ago by sforsten

  • Owner changed from sforsten to mkommend
  • Status changed from accepted to reviewing

comment:4 follow-up: Changed 4 years ago by gkronber

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

The lock cannot simply be removed as it is possible that several threads call ClonedAlgorithm_Started in parallel. I'll try to fix the problem for the next release.

comment:5 in reply to: ↑ 4 Changed 4 years ago by mkommend

Replying to gkronber:

The lock cannot simply be removed as it is possible that several threads call ClonedAlgorithm_Started in parallel. I'll try to fix the problem for the next release.

Yes you are right! sforsten and I have overlooked that the result collection is not thread-safe. An easy fix would be to introduce another lock object for access to the result collection, as the existing locker cannot be used due to deadlocks.

comment:6 Changed 4 years ago by mkommend

r9504: Added lock object to synchronize access to the results collection.

comment:7 Changed 4 years ago by mkommend

  • Status changed from assigned to reviewing

comment:8 Changed 4 years ago by gkronber

There are multiple usages of the results collection within the CV class and also potentially from other classes.

comment:9 Changed 4 years ago by gkronber

r9517 minor adaptations in cross-validation

comment:10 Changed 4 years ago by gkronber

  • Owner changed from gkronber to mkommend

Reviewed and tested r9504 and did not find any issues. Please merge the changes r9504 and r9517 into the stable branch.

comment:11 Changed 4 years ago by mkommend

Reviewed r9517.

comment:12 Changed 4 years ago by mkommend

  • Status changed from reviewing to readytorelease

comment:13 Changed 4 years ago by mkommend

  • Resolution set to done
  • Status changed from readytorelease to closed

r9525: Merged r9493, r9504, and r9517 into the stable branch.

Note: See TracTickets for help on using tickets.