Opened 4 years ago

Closed 4 years ago

#2019 closed defect (done)

Improve the use of db transactions in the Hive server

Reported by: ascheibe Owned by: abeham
Priority: medium Milestone: HeuristicLab 3.3.8
Component: Hive.Server Version: 3.3.8
Keywords: Cc:

Description

There are still some webservice methods that don't use transactions which should be changed (standard for db connections is isolation level serializable which is bad performance-wise). Also scheduling of tasks is executed under one big transaction. Transactions should be as small as possible so this should also be changed. These two things should reduce deadlocks and therefore also improve performance for up/downloads of tasks.

Change History (15)

comment:1 Changed 4 years ago by ascheibe

  • Status changed from new to accepted

comment:2 Changed 4 years ago by ascheibe

r9257

  • added missing transactions in the Hive service
  • split scheduling transaction into smaller transactions
  • improved speed of job uploading (AddTask)
  • changed highest isolation level from Serializable to RepeatableRead as phantom reads shouldn't be a problem

comment:3 Changed 4 years ago by ascheibe

r9259 improved resource assigning method

comment:4 Changed 4 years ago by ascheibe

r9261 because GetUser sometimes fails try to repeat it

comment:5 Changed 4 years ago by ascheibe

r9266 disable lazy loading in some methods to avoid deadlocks

comment:6 Changed 4 years ago by ascheibe

r9304 UpdateTaskState has to return a complete task object

comment:7 Changed 4 years ago by ascheibe

  • Owner changed from ascheibe to abeham
  • Status changed from accepted to reviewing

comment:8 Changed 4 years ago by abeham

Reviewed r9257:

  • Many and sometimes redundant calls to AuthorizeForTask, for example in
    • HiveService.GetLightweightChildTasks(Guid?, bool, bool)
    • HiveService.GetTask(Guid)
  • Check if task and taskdata belong together in HiveService.UpdateTaskData(Task, TaskData)´
  • HeartbeatManager.AssignJob -> AssignTask

comment:9 follow-up: Changed 4 years ago by abeham

  • Owner changed from abeham to ascheibe
  • Status changed from reviewing to assigned

Reviewed r9261:

  • What is the reason - why does it fail?

Reviewed r9266:

  • In Convert.ToEntity(DT.Task source, DB.Task target) I found the following, but in this thread it is stated that the EntitySet property to a referenced table is never null.
    if (target.StateLogs == null) target.StateLogs = new EntitySet<DB.StateLog>();
    foreach (DT.StateLog sl in source.StateLog.Where(x => x.Id == Guid.Empty)) {
      target.StateLogs.Add(Convert.ToEntity(sl));
    }
    
  • Not touching the statelog in UpdateTask is probably helping performance, nevertheless in UpdateTaskAndStatelog you should probably fetch the statelog together with the task by including it in the context's LoadOptions.

Reviewed r9304:

  • Why do you create three different contexts? If the whole procedure runs inside a transaction, nothing will be written to the DB until the whole method completes. Also if you need the full task object for conversion to DTO you could do that after the update of its state, no?
Last edited 4 years ago by gkronber (previous) (diff)

comment:10 Changed 4 years ago by ascheibe

r9426

  • removed useless code as suggested by abeham
  • enabled lazy loading for binary data (plugins and task data) as suggested by pfleck

comment:11 in reply to: ↑ 9 Changed 4 years ago by ascheibe

Replying to abeham:

Reviewed r9261:

  • What is the reason - why does it fail?

The reason is a deadlock in the authentication db when there is heavy load on the database. If this call fails, then the whole ws call fails which is the reason why I added repeating.

Reviewed r9266:

  • In Convert.ToEntity(DT.Task source, DB.Task target) I found the following, but in this thread it is stated that the EntitySet property to a referenced table is never null.
    if (target.StateLogs == null) target.StateLogs = new EntitySet<DB.StateLog>();
    foreach (DT.StateLog sl in source.StateLog.Where(x => x.Id == Guid.Empty)) {
      target.StateLogs.Add(Convert.ToEntity(sl));
    }
    

Thanks, I have fixed this in r9426.

  • Not touching the statelog in UpdateTask is probably helping performance, nevertheless in UpdateTaskAndStatelog you should probably fetch the statelog together with the task by including it in the context's LoadOptions.

I have fixed this in r9427.

Reviewed r9304:

  • Why do you create three different contexts? If the whole procedure runs inside a transaction, nothing will be written to the DB until the whole method completes. Also if you need the full task object for conversion to DTO you could do that after the update of its state, no?

The reason is that I disabled deferred loading for the first two db contexts, because I wanted to avoid that anything else is loaded as this is just an insert and an update of one field. The last one has deferred loading enabled as the conversion method also needs access to the statelogs. But you are right, the first two db contexts could be merged. I will change this.

Last edited 4 years ago by gkronber (previous) (diff)

comment:12 Changed 4 years ago by ascheibe

r9427 added LoadWith for UpdateTaskAndStateLogs

comment:13 Changed 4 years ago by ascheibe

  • Owner changed from ascheibe to abeham
  • Status changed from assigned to reviewing

r9428 removed redundant db context in UpdateTaskState. I'm not sure if querying the task again is really necessary. But I'm leaving it for now as pfleck has rewritten the method anyway, so a saner and faster version of this method will probably land soon in trunk.

comment:14 Changed 4 years ago by abeham

  • Status changed from reviewing to readytorelease

I reviewed the changes in r9427 and r9428.

comment:15 Changed 4 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.7 to 3.3.8
Note: See TracTickets for help on using tickets.