Opened 4 months ago

Last modified 5 weeks ago

#3062 accepted defect

Hive janitor does not properly clean up / generate statistics

Reported by: jkarder Owned by: jkarder
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Hive.General Version: trunk
Keywords: Cc:

Description

On the one hand, some queries executed by the janitor time out. On the other hand, it seems as if the FILESTREAM garbage collector does not run, which leaves us with a huge amount of data which is no longer referenced by any row.

Note: GC can be forced via sp_filestream_force_garbage_collection.

Change History (2)

comment:1 Changed 5 weeks ago by jkarder

  • Status changed from new to accepted

comment:2 Changed 5 weeks ago by jkarder

  • Version set to trunk

r17574: overhauled statistics generation and cleanup

  • switched to a single thread for database cleanup and statistics generation (executed sequentially)
  • switched to preemptive deletion of items that are in status DeletionPending (for jobs: statelogs, taskdata, tasks)
  • added code that aborts tasks whose jobs have already been marked for deletion
  • added method UseTransactionAndSubmit in addition to UseTransaction in PersistenceManager
  • updated DAO methods and introduced more bare metal sql
  • introduced DAO methods for batch deletion
  • fixed usage of enum values in DAO sql queries
  • deleted unnecessary triggers tr_JobDeleteCascade and tr_TaskDeleteCascade in Prepare Hive Database.sql
  • changed scheduling for less interference with janitor and other heartbeats
    • increased scheduling patience from 20 to 70 seconds (to wait longer to get the mutex for scheduling)
    • changed signature of ITaskScheduler.Schedule
    • added base class for TaskSchedulers and moved assignment of tasks to slaves into it
    • changed RoundRobinTaskScheduler to use bare metal sql
  • made MessageContainer a storable type (leftover)
  • updated HiveJanitorServiceInstaller.nsi
Note: See TracTickets for help on using tickets.