Opened 14 years ago
Last modified 12 years ago
#1233 closed enhancement
Hive-3.4 development — at Version 116
Reported by: | cneumuel | Owned by: | cneumuel |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.6 |
Component: | Hive.General | Version: | 3.3.6 |
Keywords: | Cc: | ascheibe |
Description (last modified by cneumuel)
General notes
Server
Refactor domain objects and db-schemaSplit info-objects and data-objects (like Job and JobData)
Data Access Layer (more consistent method names, more compact code, inspired by OKB)Split transaction and db-context handlingAllow uploading of plugins for a job (or hiveexperiment)Make WCF service completely stateless. Put all remaining state-information into the database (latestHeartbeats, latestConsistencyCheck, newlyAssignedJobs (remove completely and solve by adding a heartbeat))StateLog: Log state transitions of jobs.- Statistics
Measure core capacity and utilization every minuteMeasure CPU and memory capacity and utilization every minuteReliably measure the execution time spent on hive per user / in total. Also measure speedup values (maybe also per minute). Keep jobs deleted jobs in database (flag them) - only delete JobData, plugins ect.Number of experiments / jobs (per user). Job per slaveCalculate overall productivity per job (waiting time vs. computation time)
- Scheduler
- Consider waiting time to avoid starvation
- Users should have priorities
- A user should be able to manage priorities only in the scope of his own experiments
- Childjobs should automatically have the priorities of their parent jobs
- Precomputed job-queue
Fix wrong timestamps in statelog on services.heuristiclab.com
Slave
Adapt Slave for new ServerRefactor Slave (easier communication between core and executor)TestsConsole ClientWindows Service ClientInstaller for SlaveWindows Tray Icon for SlaveHL App ClientSort out problem with uploaded, modified assemblies which aren't downloaded to the slave; Add GUIDs to PluginCacheHeartbeat interval should be controllable by the serverCreation of a unique Id for a machine which does not change if the config is deletedCorrect total physical memory available for a slave (ConfigManager)Test sandboxing and security of appdomains. If any assemblies can be uploaded by users, becomes very important.React on SayHello action (call Hello service method)Send cpu utilisation with every heartbeatLog exceptions to Windows Event LogFreeCores needs to be decremented right after a CalculateJob message has been received. Otherwise a slave reports free cores which are already reserved for new jobs.PluginTemp directory should be cleaned up from time to time (or on startup)SlaveCommListener in Slave.Tests should not be used in ConsoleClientHeartbeats are massively delayed, because the heartbeat-method locks on engines (in GetExecutionTimeOfAllJobs) and the same lock is made at StartJobInAppDomain. This causes the a slave-heartbeat-timeout (1 minute), thus a reset and reassignment of all jobs.
Experiment Manager
Show jobs in treeview. Would greatly save screen space and navigation-clicksto be enhanced (event wiring)
- Sort HiveExperiments alphabetically
Plugin-Upload (optional)- Experiment Sharing
Appropriate numbering of RunsUse Service-Call pattern from OKB (or PPOV-Cockpit)Show StateLog - use Gantt Chart like viewPause and stop single jobsPaused jobs should not be integrated into experiment, so results are not lost. Parameters of paused jobs should be changable (and used when resumed).- Deleting jobs after adding them (neither the remove button, nor the del key, nor the context menu entry succeeds in deleting a job (experiment) that has just been dragged in)
Hive Engine
HiveEngine jobs should have a HiveExperiment, which is marked, so a user cannot see it in HiveExperimentManager. However it should be visible in Administration GUI. If a Hive Engine crashes and cannot delete the experiment, this should be detected by the server and it should be automatically deleted.Improve HiveEngine View (list of jobs, with status ect.)- Stabilize
Administration
Missing WebService Methods:
GetAllHiveExperiments- GetUsers
- GetUserStatistics
GetJobsBySlave -> GetJobsByResourceId- GetGlobalStatistics (for Statistics TabPage)
GetScheduleForResource (+ Add/Update/Delete)
TODOS:
convert HeuristicLab.Calender to a pluginuse svcutilwrite partial classes for dtos and implement IContentbuild Observable Collections for Users/Slaves/Groupsadd ContentViews for Users and SlaveGroups- show some fancy statistics
add Save Buttonintegrate HeuristicLab.Services.Hive.Common-3.4 in Serverget rid of HiveItem etc. on Server
Meeting protocols
Architects meeting (16.06.2011)
DataAccess:
- TransactionManager with interface again
- remove AssignedResourcesId in AssignedResources, use JobId+ResourceId as primary keys
- remove CreateHiveDatabaseApplication. the db schema should not be developed dbml first, since dbml does not support most sql-server features. instead the sql-server schema should be designed first and the dbml should be generated.
- UptimeCalendar should be named DowntimeCalendar
- DataAccess layer and Dao classes should be removed, access to linq to sql should happen directly in server-implementation.
Server
- Lifecycle should be named differently. maybe EventHandler, EventManager.
- put magic numbers into config
- timeout in Lifecycle
- ApplicationConstants
- GetWaitingJobs should be implemented as a stored procedure and should also assign a job to a slave. it should make sure no race conditions occur if it is called concurrently.
HiveExperiment
- rename: HiveExperiment -> Job, Job -> Task
- HiveExperimentPermissions
- the GrantedUserId could be removed
- only Full and Read permissions are necessary (Read: just read!, Full: control, delete, grant permissions)
- remove LastAccessed and IsHiveEngine. there should be a category field instead.
Change History (116)
comment:1 Changed 14 years ago by cneumuel
- Status changed from new to accepted
comment:2 Changed 14 years ago by cneumuel
- Version changed from 3.3 to branch
comment:3 Changed 13 years ago by cneumuel
- Summary changed from Refactore Hive Project Structure to Hive-3.4 development
comment:4 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:5 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:6 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:7 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:8 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:9 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:10 Changed 13 years ago by cneumuel
- Cc ascheibe added
comment:11 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:12 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:13 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:14 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:15 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:16 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:17 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:18 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:19 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:20 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:21 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:22 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:23 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:24 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:25 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:26 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:27 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:28 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:29 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:30 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:31 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:32 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:33 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:34 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:35 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:36 Changed 13 years ago by cneumuel
- updated jobstates documentation
- enhanced ganttChart
- fixed setting of jobstates
- added option to force lifecycle-trigger (mainly for testing purposes)
comment:37 Changed 13 years ago by cneumuel
- Description modified (diff)
r5637 added treeview for hive jobs in experiment manager
comment:38 Changed 13 years ago by ascheibe
r5638 worked on Administration UI
comment:39 Changed 13 years ago by cneumuel
r5675 improved treeview for hive jobs
comment:40 Changed 13 years ago by ascheibe
- Description modified (diff)
r5676 worked on Administration UI
comment:41 Changed 13 years ago by ascheibe
r5677 some minor ui fixes for slave
comment:42 Changed 13 years ago by cneumuel
r5708 changed the way transactions are handled
comment:43 Changed 13 years ago by ascheibe
- Description modified (diff)
- use SlaveComm Endpoint from app.config
- various further slave bugfixes/cleanups
- added preliminary icon for hive slave ui and some slave ui improvements
- added resource deletion to admin ui
- fix service exception thrown if there is no EventLog
comment:44 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:45 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:46 Changed 13 years ago by cneumuel
- fixed statelog when time on server differs from slave or client
- fixed wrong creation of childjobs in experiment manager
- made ganttchardview the default view for statelogs
comment:47 Changed 13 years ago by ascheibe
r5721 worked on slave and slave service installer
comment:48 Changed 13 years ago by ascheibe
- Description modified (diff)
- log uncaught exceptions to an eventlog if available
- fixed job pause bug
comment:49 Changed 13 years ago by cneumuel
- Description modified (diff)
- implemented pause, stop for single jobs
- introduced Command property for jobs (to distinguish between state and command (abort vs. aborted))
- improved behaviour of ItemTreeView (double click opens new window, selected item stays marked)
- fixed bugs in StateLogGanttChartListView and HiveJobView
- fixed cloning of client-side dtos
comment:50 Changed 13 years ago by ascheibe
r5780 various improvments on the service installer and slave tray icon
comment:51 Changed 13 years ago by ascheibe
- fixed job pause bug... again
- general Executor improvements
comment:52 Changed 13 years ago by cneumuel
- implemented correct numbering of BatchRuns
- improvements in ExperimentManager
- fixed bug in server (jobs were scheduled multiple times)
- added exception handling for task in slave
- improved timeout handling of jobs (LifecycleManager)
comment:53 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:54 Changed 13 years ago by cneumuel
r5787 made deleting and creating directories for PluginTemp more robust
comment:55 Changed 13 years ago by ascheibe
- added autostart for tray icon to installer
- machine unique id now includes the machine name
- core: check if job already exists on slave
- already finished jobs now fail and are sent back
comment:56 Changed 13 years ago by ascheibe
r5790 don't save the unique machine id
comment:57 Changed 13 years ago by cneumuel
- Description modified (diff)
- implemented correct downloading of paused jobs. its now also possible to change parameters and resume a algorithm
- removed Prepare() calls in ExperimentManager and in slave, as it prevents corrent resuming of paused jobs
- made events in ItemTreeView be invoked in the correct thread
- reduced log output in ExperimentManager
comment:58 Changed 13 years ago by ascheibe
r5795 various slave and slave tray icon improvements
comment:59 Changed 13 years ago by cneumuel
- ItemTreeView robustifications
- compactified the layout in HiveJobView
comment:60 Changed 13 years ago by ascheibe
r5826 slave ui now receives status information and displays it in doughnut chart
comment:61 Changed 13 years ago by cneumuel
- seperated ExperimentMangerClient (OKB-Style, contains business logic) and HiveExperiment (mainly only contains information)
- fixed redundant cloning methods in dtos
- added simple statistics in HiveExperiment which the user can see before downloading an experiment
- added db-delete cascade for slaves and statelogs - now slaves can be safely deleted
comment:62 Changed 13 years ago by cneumuel
r5958 initial port of HiveEngine
comment:63 Changed 13 years ago by cneumuel
r6000 :)
- added GetPlugin service method
- fixed minor issues with double plugins in database
- worked on HiveEngine
- fixed wrong role name for Hive User
- fixed bug in group assignment of slaves
comment:64 Changed 13 years ago by ascheibe
- fix pause/stop bug when serializing big experiments
- use proper newlines
- use GetPlugin(..) instead of GetPlugins()
comment:65 Changed 13 years ago by cneumuel
- changed relationship between Job and HiveExperiment. There is no more HiveExperiment.RootJobId, instead there is Job.HiveExperimentId.
- one HiveExperiment can now have multiple Experiments.
- TreeView supports multiple root nodes
- HiveEngine creates a HiveExperiment for each set of jobs, so jobs cannot be without an parent experiment anymore (no more loose jobs)
- updated ExperimentManager binaries
comment:66 Changed 13 years ago by ascheibe
- increase timeout when sending (for sending large jobs/lot's of plugins)
- handle failed GetPluginDatas() properly
comment:67 Changed 13 years ago by gkronber
GetPluginDatas() is a strange identifier. Plural of data is data.
comment:68 Changed 13 years ago by cneumuel
- created baseclass for jobs (ItemJob) which derives OperatorJobs and EngineJobs
- created special view for OptimizerJobs which derives from a more general view
- removed logic from domain class HiveExperiment and moved it into RefreshableHiveExperiment
- improved ItemTreeView
- corrected plugin dependencies
- fixed bug in database trigger when deleting HiveExperiments
- added delete cascade for Plugin and PluginData
- lots of fixes
comment:69 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:70 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:71 Changed 13 years ago by ascheibe
- Executor now sends all exceptions to the ExperimentManager as NetNamedPipe communication won't be possible in a Sandbox due to security constraints
- count stopped and aborted jobs correctly
- send correct status when a job is stopped by the ExperimentManager
- try to log unhandled exceptions to gui if no EventLog is available
- don't crash if job is sent more than once by server
comment:72 Changed 13 years ago by ascheibe
- Description modified (diff)
- don't lock engines for so long in StartJobInAppDomain
- move SlaveCommListener to ConsoleClient
- delete orphaned job folders at startup
comment:73 Changed 13 years ago by ascheibe
- simplify PreparePlugins
- send more exceptions to ExperimentManager
comment:74 Changed 13 years ago by cneumuel
- renamed engines to executors
- changed locking in StartJobInAppDomain
- avoid destruction of proxy object after 5 minutes for Slave.Core
- added JobStarted event and fixed ExecutionStateChanged and ExecutionTimeChanged
- slaves which are moved to another slavegroup will pause their jobs now, if they must not calculate them
comment:75 Changed 13 years ago by cneumuel
r6111 improved the way jobs are downloaded by ExperimentManager and HiveEngine
comment:76 Changed 13 years ago by ascheibe
- HeartbeatManager: don't sleep while starting jobs
- Executor: make Start() blocking
- shutdown properly if an uncaught exception is thrown
comment:77 Changed 13 years ago by ascheibe
- SlaveTrayIcon: don't try to kill TrayIcons from other users
- split installer to fix config installer bug for users who did not run the installer
comment:78 Changed 13 years ago by ascheibe
r6166 forgot to check in HL icon for installers
comment:79 Changed 13 years ago by ascheibe
- increased send/receive timeout
- renamed hive binding name
comment:80 Changed 13 years ago by cneumuel
- removed Job-dto objects from slave core (since it stores outdated objects)
- added command textbox to HiveJobView
- improved the way the control buttons behave in HiveJobView
- improved job control (pause and stop is also possible when job is not currently calculating)
- improved gantt chart view (last state log entry is also displayed)
- unified code for downloading jobs between experiment manager and hive engine
comment:81 Changed 13 years ago by ascheibe
r6175 temporary switch to privileged sandboxing until communication between core and executor works with sandboxing
comment:82 Changed 13 years ago by cneumuel
- added semaphores to ensure an appdomain is never unloaded when the start method has not finished
- HiveEngine uploading and downloading of jobs works and is displayed in the view
comment:83 Changed 13 years ago by ascheibe
- dropped dependency of Core from Executor
- enabled sandboxing
- moved most parts of Job handling from Core to SlaveJob to simplify locking
- optimized how UsedCores is handled
- SlaveStatusInfo is now thread-save and counts jobs more correct
comment:84 Changed 13 years ago by ascheibe
r6204 don't crash on shutdown
comment:85 Changed 13 years ago by cneumuel
r6212 created HiveEngine.Views plugin
comment:86 Changed 13 years ago by ascheibe
- make UsedCores more reliable
- some cosmetic fixes
comment:87 Changed 13 years ago by cneumuel
r6219 improved exception handling for hive experiments
comment:88 Changed 13 years ago by ascheibe
- Slave UI now uses tab pages
- balloon tips are displayed on receiving new jobs
comment:89 Changed 13 years ago by cneumuel
- Description modified (diff)
- added basic statistics recording (once per minute) for
- executiontime per user
- usedcores, usedmemory per slave
comment:90 Changed 13 years ago by ascheibe
- don't set every view as default in slave ui
- fixed bug in PluginCache where files got accessed by multiple threads
comment:91 Changed 13 years ago by ascheibe
- don't set job failed if JobNotFoundException is thrown
- disable AboutView for all items
- avoid NullRefException in SendFinishedJob
comment:92 Changed 13 years ago by ascheibe
- added UAC self elevation for start/stop of windows service
- added slave states and simplified ui commands
comment:93 Changed 13 years ago by ascheibe
- added view for displaying jobs
- improved slave ui
comment:94 Changed 13 years ago by ascheibe
- Description modified (diff)
comment:95 Changed 13 years ago by cneumuel
- Description modified (diff)
- extended statistics recording:
- execution times of users are captured
- execution times and start-to-finish time of finished jobs is captured (to computer hive overhead)
- data of deleted jobs is automatically captured in DeletedJobStatistics
- changed ExecutionTime type in database from string to float (milliseconds are stored instead of TimeSpan.ToString())
- added IsPrivileged field to job to indicate if it should be executed in a privileged sandbox
- added CpuUtilization field to slave to be able to report cpu utilization
- added GetJobsByResourceId to retrieve all jobs which are currently beeing calculated in a slave(-group)
- TransactionManager now allows to use serializable tranactions (used for lifecycle trigger)
comment:96 Changed 13 years ago by cneumuel
- Description modified (diff)
r6269 added CpuUtilization to heartbeats
comment:97 Changed 13 years ago by cneumuel
- refactoring of slave core
- created JobManager, which is responsible for managing jobs without knowing anything about the service. this class is easier testable than slave core
- lots of cleanup
- created console test project for slave
comment:98 Changed 13 years ago by cneumuel
r6362 changed roles authentication to use an AuthenticationManager instead of method attributes. this makes unit tests easier.
comment:99 Changed 13 years ago by cneumuel
- Description modified (diff)
- added consideration of appointments in heartbeats
- code cleanup
comment:100 Changed 13 years ago by ascheibe
- Description modified (diff)
- code cleanups for slave review
- added switch between privileged and unprivileged sandbox
- removed childjob management because it's not used
comment:101 Changed 13 years ago by ascheibe
r6372 changed year to 2011
comment:102 Changed 13 years ago by cneumuel
- moved ExperimentManager into separate plugin
- moved Administration into separate plugin
comment:103 Changed 13 years ago by cneumuel
- locking for childHiveJobs in OptimizerHiveJob avoid multi threaded access issues
- added IsPrivileged to gui
- minor changes
comment:104 Changed 13 years ago by ascheibe
- implemented usage of checksums for comparing assemblies
- re-added CreateHiveDatabaseApplication.cs to project
comment:105 Changed 13 years ago by abeham
- fixed references to absolute path references
comment:106 Changed 13 years ago by cneumuel
- created events when statelog changed
- fixed memory leak in hiveengine
- extended timeout for long running transactions and database contexts (when jobdata is stored)
- replaced random guids in database with sequential guids for performance reasons
- minor fixes and cleanups
- updated hive binaries
- updated statistics
comment:107 Changed 13 years ago by abeham
- synchronized config file with that from trunk
comment:108 Changed 13 years ago by abeham
- Description modified (diff)
Added TODO point regarding the deletion of jobs in the experiment manager
comment:109 Changed 13 years ago by ascheibe
r6426 removed useLocalPlugins
comment:110 Changed 13 years ago by cneumuel
r6431 - applied some review comments
General:
- changed Log to ThreadSafeLog
- added license information to all files
- added assembly descriptions
- using blocks before namespace
HeuristicLab.Services.Hive.DataAccess:
- made TransactionManager static
- removed DaoException
- removed TimeSpanExtensions
- renamed prepareHiveDatabase.sql should be renamed to Prepare Hive Database.sql
- created Initialize Hive Database.sql
comment:111 Changed 13 years ago by cneumuel
- some cleanup in HiveEngine
- using ThreadSafeLog instead of synchronized methods
comment:112 Changed 13 years ago by ascheibe
r6437 Admin UI:
- some bugfixes
- removed dummy stuff
comment:113 Changed 13 years ago by cneumuel
- stability improvements for HiveExperiment and HiveEngine
- parallelized upload of jobs
- enabled cancellation of job upload
- reduced the amount of double-assignment of jobs by an additional check in HeartbeatManager
- tried to tackle the amount of deadlocks by automatically rerunning transactions
- some fixes
comment:114 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:115 Changed 13 years ago by cneumuel
- Description modified (diff)
comment:116 Changed 13 years ago by cneumuel
- Description modified (diff)
r5633 added Appointment/Schedule ws and dao methods