Opened 5 years ago

Closed 5 years ago

#1672 closed enhancement (done)

Hive trunk integration

Reported by: ascheibe Owned by: ascheibe
Priority: medium Milestone: HeuristicLab 3.3.6
Component: Hive.General Version: 3.3.6
Keywords: Cc:

Description

This ticket is for tracking the trunk integration of HeuristicLab Hive (#1233).

Change History (68)

comment:1 Changed 5 years ago by ascheibe

  • Status changed from new to assigned

comment:2 Changed 5 years ago by ascheibe

r6976 integrate the Hive client projects into trunk (Hive Job Manager and Administrator)

comment:3 Changed 5 years ago by ascheibe

r6977

  • removed unused files
  • added missing license headers

comment:4 Changed 5 years ago by ascheibe

r6979 fixed plugin dependencies

comment:5 Changed 5 years ago by ascheibe

r6983

  • added the Hive Services and Slave projects
  • added missing svn ignores

comment:6 Changed 5 years ago by ascheibe

r6985 added documentation for Hive

comment:7 Changed 5 years ago by ascheibe

r6993

  • removed unused files
  • changed the plugin cache path of the Slave HL App so that HL doesn't discover Hive assemblies
  • cleaned up config files
  • incremented version number of installers to 3.3.6
  • removed Execution time on Hive from Status page because it can't be calculated without the user statistics

comment:8 Changed 5 years ago by ascheibe

r6994

  • removed dead code
  • added Hive assembly references to Tests project
  • fixed problems found by tests

comment:9 Changed 5 years ago by ascheibe

r6995 fixed typo which led to false assembly paths in the Slave App

comment:10 Changed 5 years ago by ascheibe

r6997 added missing invoke

comment:11 Changed 5 years ago by ascheibe

r6998 Changed again how plugin discovery works because of Hive. The reason is that it must be possible to move the plugin and working directories away from the original slave working directory. This is needed for the Slave App and also in the future for the windows service because we don't want it to run as the LocalSystem user. I have removed setting the PrivateBinPath and am now setting the ApplicationBase. This doesn't effect HL (because ApplicationBase is set by default to !pluginDir anyway) but makes Hive work. The reason why setting the PrivateBinPath doesn't work with moving plugin and working directories is (from msdn): "Private assemblies are deployed in the same directory structure as the application. If the directories specified for PrivateBinPath are not under ApplicationBase, they are ignored."

comment:12 Changed 5 years ago by ascheibe

r7009 moved the Hive services parts to services project as suggested by swagner

comment:13 Changed 5 years ago by ascheibe

  • Owner changed from ascheibe to abeham
  • Status changed from assigned to reviewing

comment:14 Changed 5 years ago by ascheibe

r7014 increased HB interval to 20 secs

comment:15 Changed 5 years ago by ascheibe

r7020

  • increased max. object graph size which can be serialized to allow downloading of big jobs
  • removed more magic numbers
  • increased job polling interval

comment:16 Changed 5 years ago by ascheibe

r7029 fixed a small bug when refreshing permissions

comment:17 Changed 5 years ago by ascheibe

r7032 reverted change that exceptions are thrown in the job manager

comment:18 Changed 5 years ago by ascheibe

r7045 switch more service calls to IsolationLevel=ReadUncommitted transactions to prevent db deadlocks

comment:19 Changed 5 years ago by ascheibe

r7046 added Hive and Benchmarking project dependencies to HeuristicLab-3.3 project

comment:20 Changed 5 years ago by ascheibe

r7047

  • use transactions for status page
  • removed speed up charts because they are only working with user statistics

comment:21 Changed 5 years ago by ascheibe

r7048

  • removed duplicate events
  • got rid of compiler warnings

comment:22 Changed 5 years ago by ascheibe

r7056

  • disabled drag and drop for Hive Jobs
  • fixed adding and deleting of Hive Tasks
  • show Statelog after a Job is downloaded

comment:23 Changed 5 years ago by ascheibe

r7059

  • Permissions can now be deleted
  • fixed overlay icons for permissions
  • fixed overlay icons in job list

comment:24 Changed 5 years ago by ascheibe

r7067

  • allow drag and drop only for new jobs
  • prepare optimizers on drag and drop

comment:25 Changed 5 years ago by ascheibe

r7068

  • added a default job name
  • added check that a job name is set before upload

comment:26 Changed 5 years ago by ascheibe

r7078 added missing invoke

comment:27 Changed 5 years ago by ascheibe

r7103 cleaned up namespaces of the jobmanager

comment:28 Changed 5 years ago by ascheibe

r7104 fixed graphical glitches and corrected tab order in the job manager

comment:29 Changed 5 years ago by abeham

Are there any more changes coming to this ticket? If so, please take it again, it's currently in reviewing state.

comment:30 Changed 5 years ago by ascheibe

  • Owner changed from abeham to ascheibe
  • Status changed from reviewing to assigned

comment:31 Changed 5 years ago by ascheibe

r7115

  • speed up download of tasks by avoiding unnecessary service calls
  • display download progress correctly

comment:32 Changed 5 years ago by ascheibe

r7125

  • removed magic numbers for upload retries
  • speed up job downloading by placing deserializing/downloading semaphores correctly
  • increased max. number of parallel downloads/deserializations
  • added more status messages when downloading to make it more clear what's actually happening
  • renamed some variables

comment:33 Changed 5 years ago by ascheibe

r7131 try to stop the slave service before uninstalling it

comment:34 Changed 5 years ago by ascheibe

  • Owner changed from ascheibe to abeham
  • Status changed from assigned to reviewing

r7132

  • fixed name of slave windows service

reviewing comments:

  • reduced MaxParallelDownloads to 2 to not completely overload cpus
  • renamed ServiceLocator to HiveServiceLocator

comment:35 Changed 5 years ago by abeham

r7133

  • Added missing configurations to Clients.Hive-3.3 project

comment:36 Changed 5 years ago by ascheibe

r7135 some fixes for the slave tray ui:

  • removed some more magic numbers
  • fixed reconnecting to windows service when it was stopped
  • added more time for stopping/starting windows service so that no exception is thrown because of a timeout

comment:37 Changed 5 years ago by abeham

EngineHiveTask:

  • please clean up the GetAsTaskData method, there's an uncommented lock. Is it necessary, why was it there in the beginning?

OptimizerHiveTask:

  • GetNewRunName and GetRunNumber: idx will be 3 and not -1 if string cannot be found

TaskData:

  • is contained in the file JobData.cs, should be renamed to TaskData.cs

PersistenceUtil:

  • would be nicer to use the using pattern for MemoryStream

JobResultPoller:

  • Why is stopRequested a property: private bool stopRequested { get; set; }

There are some singletons (e.g. HiveClient) which are not fully thread-safe (although for HiveClient this is probably not a problem). The MSDN implementation pattern for singletons suggests to either use double-checked locking or static initialization (private static readonly instance = new HiveClient();). Interestingly the article states that the static initialization is also lazily instantiated, because it is private and accessed only from Instance.

I'm a bit confused that the HiveServiceLocator.Instance property provides a public setter!?

comment:38 Changed 5 years ago by ascheibe

r7142 implemented reviewing comments

comment:39 Changed 5 years ago by ascheibe

Thanks for your comments. Concerning the EngineHiveTask: I don't think the locks are necessary. The GetAsTaskData is only used when uploading tasks and one task is only uploaded once, so this should be ok.

comment:40 Changed 5 years ago by ascheibe

r7144 stop result polling before deleting a job

comment:41 Changed 5 years ago by ascheibe

r7146 admin ui

  • improved treeview
  • added more icons and tooltips

comment:42 Changed 5 years ago by ascheibe

r7152

  • added missing invokes
  • removed collecting types of object graph for retrieval of plugins as this isn't needed for MetaOpt. I think the problem was that MetaOpt always stored types of all available algorithms and problems which caused the problem (#1527) with Hive.

comment:43 Changed 5 years ago by ascheibe

r7156 fixed starting, pausing and stopping of jobs and tasks

comment:44 Changed 5 years ago by ascheibe

r7157 improved event logging in the Hive service

comment:45 Changed 5 years ago by ascheibe

r7158 another event logging fix

comment:46 Changed 5 years ago by abeham

I just tested downloading a running job (1250 tasks). It worked to download, but the JobManager never exited the populating / updating display state. I then switched the view to another job and back to the just downloaded job: The application locked up without using any CPU. I gave you permission to view the job it's called "QAPLIB SA parameter range (lipa20b - tai50b)". I'm pretty sure it'll still be running tomorrow, so you can test for yourself.

comment:47 Changed 5 years ago by ascheibe

r7162 throw exceptions in Job Manager so that we can see if something went wrong

comment:48 Changed 5 years ago by ascheibe

r7164 set taskDataInvalid when deserializing fails

comment:49 Changed 5 years ago by ascheibe

r7165 build the task tree first and then display it. This should be more light on the CPU.

comment:50 Changed 5 years ago by ascheibe

r7166

  • don't serialize the results 2 times before uploading
  • made slave a little bit more robust

comment:51 Changed 5 years ago by ascheibe

r7171 communication with ui should be more stable now

comment:52 Changed 5 years ago by ascheibe

r7177 fixed setting of priorities in the Job Manager

comment:53 Changed 5 years ago by ascheibe

r7178 disabled checking if there are parent tasks which should to be calculated because this case doesn't exist at the moment

comment:54 Changed 5 years ago by ascheibe

r7182 switched to HeuristicLab Log and LogView for the Slave UI

comment:55 Changed 5 years ago by ascheibe

r7185

  • possible fix for the slave hang problem: don't host the service on the thread it was created on
  • added a trigger for deleting slavestatistics when statistics are deleted

comment:56 Changed 5 years ago by ascheibe

r7187

  • increased times between life cycles on the server
  • some smaller performance improvements on the server

comment:57 Changed 5 years ago by ascheibe

r7189

  • increased timeout for slaves/tasks to 3 minutes
  • moved the cleanup functionality to an own windows service. This will hopefully increase performance because it was done within the heartbeat calls up until now.

comment:58 Changed 5 years ago by ascheibe

r7190

  • fixed a typo
  • increased transferring timeout

comment:59 Changed 5 years ago by ascheibe

r7192 some small job manager ui fixes

comment:60 Changed 5 years ago by ascheibe

r7191

  • tooltip now shows the name and the id of a task
  • the date is now shown on the x-axis for runs spanning multiple days

comment:61 Changed 5 years ago by abeham

Deleting a job while it's being downloaded causes HL to crash

comment:62 Changed 5 years ago by ascheibe

Thanks for the bug report!
r7200: don't allow deleting jobs while they are uploading/downloading

comment:63 Changed 5 years ago by ascheibe

r7217 fixed compiler warning

comment:64 Changed 5 years ago by ascheibe

r7218 renamed some jobs to tasks

comment:65 Changed 5 years ago by ascheibe

r7219 renamed wrongly named folder

comment:66 Changed 5 years ago by ascheibe

r7222 don't crash if there are no child hive tasks

comment:67 Changed 5 years ago by abeham

  • Owner changed from abeham to ascheibe
  • Status changed from reviewing to readytorelease

The last test with several computers joining was successful and no further bugs have been identified.

comment:68 Changed 5 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.5 to 3.3.6
Note: See TracTickets for help on using tickets.