Free cookie consent management tool by TermsFeed Policy Generator

Opened 14 years ago

Last modified 12 years ago

#1233 closed enhancement

Hive-3.4 development — at Version 116

Reported by: cneumuel Owned by: cneumuel
Priority: medium Milestone: HeuristicLab 3.3.6
Component: Hive.General Version: 3.3.6
Keywords: Cc: ascheibe

Description (last modified by cneumuel)

General notes

Server

  • Refactor domain objects and db-schema
    • Split info-objects and data-objects (like Job and JobData)
  • Data Access Layer (more consistent method names, more compact code, inspired by OKB)
  • Split transaction and db-context handling
  • Allow uploading of plugins for a job (or hiveexperiment)
  • Make WCF service completely stateless. Put all remaining state-information into the database (latestHeartbeats, latestConsistencyCheck, newlyAssignedJobs (remove completely and solve by adding a heartbeat))
  • StateLog: Log state transitions of jobs.
  • Statistics
    • Measure core capacity and utilization every minute
    • Measure CPU and memory capacity and utilization every minute
    • Reliably measure the execution time spent on hive per user / in total. Also measure speedup values (maybe also per minute). Keep jobs deleted jobs in database (flag them) - only delete JobData, plugins ect.
    • Number of experiments / jobs (per user). Job per slave
    • Calculate overall productivity per job (waiting time vs. computation time)
  • Scheduler
    • Consider waiting time to avoid starvation
    • Users should have priorities
    • A user should be able to manage priorities only in the scope of his own experiments
    • Childjobs should automatically have the priorities of their parent jobs
    • Precomputed job-queue
  • Fix wrong timestamps in statelog on services.heuristiclab.com

Slave

  • Adapt Slave for new Server
  • Refactor Slave (easier communication between core and executor)
  • Tests
  • Console Client
  • Windows Service Client
  • Installer for Slave
  • Windows Tray Icon for Slave
  • HL App Client
  • Sort out problem with uploaded, modified assemblies which aren't downloaded to the slave; Add GUIDs to PluginCache
  • Heartbeat interval should be controllable by the server
  • Creation of a unique Id for a machine which does not change if the config is deleted
  • Correct total physical memory available for a slave (ConfigManager)
  • Test sandboxing and security of appdomains. If any assemblies can be uploaded by users, becomes very important.
  • React on SayHello action (call Hello service method)
  • Send cpu utilisation with every heartbeat
  • Log exceptions to Windows Event Log
  • FreeCores needs to be decremented right after a CalculateJob message has been received. Otherwise a slave reports free cores which are already reserved for new jobs.
  • PluginTemp directory should be cleaned up from time to time (or on startup)
  • SlaveCommListener in Slave.Tests should not be used in ConsoleClient
  • Heartbeats are massively delayed, because the heartbeat-method locks on engines (in GetExecutionTimeOfAllJobs) and the same lock is made at StartJobInAppDomain. This causes the a slave-heartbeat-timeout (1 minute), thus a reset and reassignment of all jobs.

Experiment Manager

  • Show jobs in treeview. Would greatly save screen space and navigation-clicks
    • to be enhanced (event wiring)
  • Sort HiveExperiments alphabetically
  • Plugin-Upload (optional)
  • Experiment Sharing
  • Appropriate numbering of Runs
  • Use Service-Call pattern from OKB (or PPOV-Cockpit)
  • Show StateLog - use Gantt Chart like view
  • Pause and stop single jobs
  • Paused jobs should not be integrated into experiment, so results are not lost. Parameters of paused jobs should be changable (and used when resumed).
  • Deleting jobs after adding them (neither the remove button, nor the del key, nor the context menu entry succeeds in deleting a job (experiment) that has just been dragged in)

Hive Engine

  • HiveEngine jobs should have a HiveExperiment, which is marked, so a user cannot see it in HiveExperimentManager. However it should be visible in Administration GUI. If a Hive Engine crashes and cannot delete the experiment, this should be detected by the server and it should be automatically deleted.
  • Improve HiveEngine View (list of jobs, with status ect.)
  • Stabilize

Administration

Missing WebService Methods:

  • GetAllHiveExperiments
  • GetUsers
  • GetUserStatistics
  • GetJobsBySlave -> GetJobsByResourceId
  • GetGlobalStatistics (for Statistics TabPage)
  • GetScheduleForResource (+ Add/Update/Delete)

TODOS:

  • convert HeuristicLab.Calender to a plugin
  • use svcutil
  • write partial classes for dtos and implement IContent
  • build Observable Collections for Users/Slaves/Groups
  • add ContentViews for Users and SlaveGroups
  • show some fancy statistics
  • add Save Button
  • integrate HeuristicLab.Services.Hive.Common-3.4 in Server
  • get rid of HiveItem etc. on Server

Meeting protocols

Architects meeting (16.06.2011)

DataAccess:

  • TransactionManager with interface again
  • remove AssignedResourcesId in AssignedResources, use JobId+ResourceId as primary keys
  • remove CreateHiveDatabaseApplication. the db schema should not be developed dbml first, since dbml does not support most sql-server features. instead the sql-server schema should be designed first and the dbml should be generated.
  • UptimeCalendar should be named DowntimeCalendar
  • DataAccess layer and Dao classes should be removed, access to linq to sql should happen directly in server-implementation.

Server

  • Lifecycle should be named differently. maybe EventHandler, EventManager.
  • put magic numbers into config
    • timeout in Lifecycle
    • ApplicationConstants
  • GetWaitingJobs should be implemented as a stored procedure and should also assign a job to a slave. it should make sure no race conditions occur if it is called concurrently.

HiveExperiment

  • rename: HiveExperiment -> Job, Job -> Task
  • HiveExperimentPermissions
    • the GrantedUserId could be removed
    • only Full and Read permissions are necessary (Read: just read!, Full: control, delete, grant permissions)
  • remove LastAccessed and IsHiveEngine. there should be a category field instead.

Change History (116)

comment:1 Changed 14 years ago by cneumuel

  • Status changed from new to accepted

comment:2 Changed 14 years ago by cneumuel

  • Version changed from 3.3 to branch

comment:3 Changed 13 years ago by cneumuel

  • Summary changed from Refactore Hive Project Structure to Hive-3.4 development

comment:4 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:5 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:6 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:7 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:8 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:9 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:10 Changed 13 years ago by cneumuel

  • Cc ascheibe added

comment:11 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:12 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:13 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:14 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:15 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:16 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:17 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:18 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:19 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:20 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:21 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:22 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:23 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:24 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:25 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:26 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:27 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:28 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:29 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:30 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:31 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:32 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:33 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:34 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:35 Changed 13 years ago by ascheibe

  • Description modified (diff)

r5633 added Appointment/Schedule ws and dao methods

comment:36 Changed 13 years ago by cneumuel

r5636

  • updated jobstates documentation
  • enhanced ganttChart
  • fixed setting of jobstates
  • added option to force lifecycle-trigger (mainly for testing purposes)

comment:37 Changed 13 years ago by cneumuel

  • Description modified (diff)

r5637 added treeview for hive jobs in experiment manager

Last edited 13 years ago by cneumuel (previous) (diff)

comment:38 Changed 13 years ago by ascheibe

r5638 worked on Administration UI

comment:39 Changed 13 years ago by cneumuel

r5675 improved treeview for hive jobs

Last edited 13 years ago by cneumuel (previous) (diff)

comment:40 Changed 13 years ago by ascheibe

  • Description modified (diff)

r5676 worked on Administration UI

comment:41 Changed 13 years ago by ascheibe

r5677 some minor ui fixes for slave

comment:42 Changed 13 years ago by cneumuel

r5708 changed the way transactions are handled

Last edited 13 years ago by cneumuel (previous) (diff)

comment:43 Changed 13 years ago by ascheibe

  • Description modified (diff)

r5711

  • use SlaveComm Endpoint from app.config
  • various further slave bugfixes/cleanups
  • added preliminary icon for hive slave ui and some slave ui improvements
  • added resource deletion to admin ui
  • fix service exception thrown if there is no EventLog

comment:44 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:45 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:46 Changed 13 years ago by cneumuel

r5718

  • fixed statelog when time on server differs from slave or client
  • fixed wrong creation of childjobs in experiment manager
  • made ganttchardview the default view for statelogs

comment:47 Changed 13 years ago by ascheibe

r5721 worked on slave and slave service installer

comment:48 Changed 13 years ago by ascheibe

  • Description modified (diff)

r5778

  • log uncaught exceptions to an eventlog if available
  • fixed job pause bug

comment:49 Changed 13 years ago by cneumuel

  • Description modified (diff)

r5779

  • implemented pause, stop for single jobs
  • introduced Command property for jobs (to distinguish between state and command (abort vs. aborted))
  • improved behaviour of ItemTreeView (double click opens new window, selected item stays marked)
  • fixed bugs in StateLogGanttChartListView and HiveJobView
  • fixed cloning of client-side dtos

comment:50 Changed 13 years ago by ascheibe

r5780 various improvments on the service installer and slave tray icon

comment:51 Changed 13 years ago by ascheibe

r5782

  • fixed job pause bug... again
  • general Executor improvements

comment:52 Changed 13 years ago by cneumuel

r5786

  • implemented correct numbering of BatchRuns
  • improvements in ExperimentManager
  • fixed bug in server (jobs were scheduled multiple times)
  • added exception handling for task in slave
  • improved timeout handling of jobs (LifecycleManager)

comment:53 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:54 Changed 13 years ago by cneumuel

r5787 made deleting and creating directories for PluginTemp more robust

comment:55 Changed 13 years ago by ascheibe

r5789

  • added autostart for tray icon to installer
  • machine unique id now includes the machine name
  • core: check if job already exists on slave
  • already finished jobs now fail and are sent back

comment:56 Changed 13 years ago by ascheibe

r5790 don't save the unique machine id

comment:57 Changed 13 years ago by cneumuel

  • Description modified (diff)

r5793

  • implemented correct downloading of paused jobs. its now also possible to change parameters and resume a algorithm
  • removed Prepare() calls in ExperimentManager and in slave, as it prevents corrent resuming of paused jobs
  • made events in ItemTreeView be invoked in the correct thread
  • reduced log output in ExperimentManager

comment:58 Changed 13 years ago by ascheibe

r5795 various slave and slave tray icon improvements

comment:59 Changed 13 years ago by cneumuel

r5797

  • ItemTreeView robustifications
  • compactified the layout in HiveJobView

comment:60 Changed 13 years ago by ascheibe

r5826 slave ui now receives status information and displays it in doughnut chart

comment:61 Changed 13 years ago by cneumuel

r5955

  • seperated ExperimentMangerClient (OKB-Style, contains business logic) and HiveExperiment (mainly only contains information)
  • fixed redundant cloning methods in dtos
  • added simple statistics in HiveExperiment which the user can see before downloading an experiment
  • added db-delete cascade for slaves and statelogs - now slaves can be safely deleted

comment:62 Changed 13 years ago by cneumuel

r5958 initial port of HiveEngine

Last edited 13 years ago by cneumuel (previous) (diff)

comment:63 Changed 13 years ago by cneumuel

r6000 :)

  • added GetPlugin service method
  • fixed minor issues with double plugins in database
  • worked on HiveEngine
  • fixed wrong role name for Hive User
  • fixed bug in group assignment of slaves

comment:64 Changed 13 years ago by ascheibe

r6004

  • fix pause/stop bug when serializing big experiments
  • use proper newlines
  • use GetPlugin(..) instead of GetPlugins()

comment:65 Changed 13 years ago by cneumuel

r6006

  • changed relationship between Job and HiveExperiment. There is no more HiveExperiment.RootJobId, instead there is Job.HiveExperimentId.
  • one HiveExperiment can now have multiple Experiments.
  • TreeView supports multiple root nodes
  • HiveEngine creates a HiveExperiment for each set of jobs, so jobs cannot be without an parent experiment anymore (no more loose jobs)
  • updated ExperimentManager binaries

comment:66 Changed 13 years ago by ascheibe

r6008

  • increase timeout when sending (for sending large jobs/lot's of plugins)
  • handle failed GetPluginDatas() properly

comment:67 Changed 13 years ago by gkronber

GetPluginDatas() is a strange identifier. Plural of data is data.

comment:68 Changed 13 years ago by cneumuel

r6033

  • created baseclass for jobs (ItemJob) which derives OperatorJobs and EngineJobs
  • created special view for OptimizerJobs which derives from a more general view
  • removed logic from domain class HiveExperiment and moved it into RefreshableHiveExperiment
  • improved ItemTreeView
  • corrected plugin dependencies
  • fixed bug in database trigger when deleting HiveExperiments
  • added delete cascade for Plugin and PluginData
  • lots of fixes

comment:69 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:70 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:71 Changed 13 years ago by ascheibe

r6100

  • Executor now sends all exceptions to the ExperimentManager as NetNamedPipe communication won't be possible in a Sandbox due to security constraints
  • count stopped and aborted jobs correctly
  • send correct status when a job is stopped by the ExperimentManager
  • try to log unhandled exceptions to gui if no EventLog is available
  • don't crash if job is sent more than once by server

comment:72 Changed 13 years ago by ascheibe

  • Description modified (diff)

r6101

  • don't lock engines for so long in StartJobInAppDomain
  • move SlaveCommListener to ConsoleClient
  • delete orphaned job folders at startup

comment:73 Changed 13 years ago by ascheibe

r6107

  • simplify PreparePlugins
  • send more exceptions to ExperimentManager

comment:74 Changed 13 years ago by cneumuel

r6110

  • renamed engines to executors
  • changed locking in StartJobInAppDomain
  • avoid destruction of proxy object after 5 minutes for Slave.Core
  • added JobStarted event and fixed ExecutionStateChanged and ExecutionTimeChanged
  • slaves which are moved to another slavegroup will pause their jobs now, if they must not calculate them

comment:75 Changed 13 years ago by cneumuel

r6111 improved the way jobs are downloaded by ExperimentManager and HiveEngine

comment:76 Changed 13 years ago by ascheibe

r6112

  • HeartbeatManager: don't sleep while starting jobs
  • Executor: make Start() blocking
  • shutdown properly if an uncaught exception is thrown

comment:77 Changed 13 years ago by ascheibe

r6116

  • SlaveTrayIcon: don't try to kill TrayIcons from other users
  • split installer to fix config installer bug for users who did not run the installer

comment:78 Changed 13 years ago by ascheibe

r6166 forgot to check in HL icon for installers

comment:79 Changed 13 years ago by ascheibe

r6167

  • increased send/receive timeout
  • renamed hive binding name

comment:80 Changed 13 years ago by cneumuel

r6168

  • removed Job-dto objects from slave core (since it stores outdated objects)
  • added command textbox to HiveJobView
  • improved the way the control buttons behave in HiveJobView
  • improved job control (pause and stop is also possible when job is not currently calculating)
  • improved gantt chart view (last state log entry is also displayed)
  • unified code for downloading jobs between experiment manager and hive engine

comment:81 Changed 13 years ago by ascheibe

r6175 temporary switch to privileged sandboxing until communication between core and executor works with sandboxing

comment:82 Changed 13 years ago by cneumuel

r6178

  • added semaphores to ensure an appdomain is never unloaded when the start method has not finished
  • HiveEngine uploading and downloading of jobs works and is displayed in the view

comment:83 Changed 13 years ago by ascheibe

r6203

  • dropped dependency of Core from Executor
  • enabled sandboxing
  • moved most parts of Job handling from Core to SlaveJob to simplify locking
  • optimized how UsedCores is handled
  • SlaveStatusInfo is now thread-save and counts jobs more correct

comment:84 Changed 13 years ago by ascheibe

r6204 don't crash on shutdown

comment:85 Changed 13 years ago by cneumuel

r6212 created HiveEngine.Views plugin

comment:86 Changed 13 years ago by ascheibe

r6216

  • make UsedCores more reliable
  • some cosmetic fixes

comment:87 Changed 13 years ago by cneumuel

r6219 improved exception handling for hive experiments

comment:88 Changed 13 years ago by ascheibe

r6225

  • Slave UI now uses tab pages
  • balloon tips are displayed on receiving new jobs

comment:89 Changed 13 years ago by cneumuel

  • Description modified (diff)

r6229

  • added basic statistics recording (once per minute) for
    • executiontime per user
    • usedcores, usedmemory per slave

comment:90 Changed 13 years ago by ascheibe

r6230

  • don't set every view as default in slave ui
  • fixed bug in PluginCache where files got accessed by multiple threads

comment:91 Changed 13 years ago by ascheibe

r6248

  • don't set job failed if JobNotFoundException is thrown
  • disable AboutView for all items
  • avoid NullRefException in SendFinishedJob

comment:92 Changed 13 years ago by ascheibe

r6257

  • added UAC self elevation for start/stop of windows service
  • added slave states and simplified ui commands

comment:93 Changed 13 years ago by ascheibe

r6263

  • added view for displaying jobs
  • improved slave ui

comment:94 Changed 13 years ago by ascheibe

  • Description modified (diff)

comment:95 Changed 13 years ago by cneumuel

  • Description modified (diff)

r6267

  • extended statistics recording:
    • execution times of users are captured
    • execution times and start-to-finish time of finished jobs is captured (to computer hive overhead)
    • data of deleted jobs is automatically captured in DeletedJobStatistics
  • changed ExecutionTime type in database from string to float (milliseconds are stored instead of TimeSpan.ToString())
  • added IsPrivileged field to job to indicate if it should be executed in a privileged sandbox
  • added CpuUtilization field to slave to be able to report cpu utilization
  • added GetJobsByResourceId to retrieve all jobs which are currently beeing calculated in a slave(-group)
  • TransactionManager now allows to use serializable tranactions (used for lifecycle trigger)

comment:96 Changed 13 years ago by cneumuel

  • Description modified (diff)

r6269 added CpuUtilization to heartbeats

comment:97 Changed 13 years ago by cneumuel

r6357

  • refactoring of slave core
  • created JobManager, which is responsible for managing jobs without knowing anything about the service. this class is easier testable than slave core
  • lots of cleanup
  • created console test project for slave

comment:98 Changed 13 years ago by cneumuel

r6362 changed roles authentication to use an AuthenticationManager instead of method attributes. this makes unit tests easier.

comment:99 Changed 13 years ago by cneumuel

  • Description modified (diff)

r6369

  • added consideration of appointments in heartbeats
  • code cleanup

comment:100 Changed 13 years ago by ascheibe

  • Description modified (diff)

r6371

  • code cleanups for slave review
  • added switch between privileged and unprivileged sandbox
  • removed childjob management because it's not used

comment:101 Changed 13 years ago by ascheibe

r6372 changed year to 2011

comment:102 Changed 13 years ago by cneumuel

r6373

  • moved ExperimentManager into separate plugin
  • moved Administration into separate plugin

comment:103 Changed 13 years ago by cneumuel

r6381

  • locking for childHiveJobs in OptimizerHiveJob avoid multi threaded access issues
  • added IsPrivileged to gui
  • minor changes

comment:104 Changed 13 years ago by ascheibe

r6407

  • implemented usage of checksums for comparing assemblies
  • re-added CreateHiveDatabaseApplication.cs to project

comment:105 Changed 13 years ago by abeham

r6418

  • fixed references to absolute path references

comment:106 Changed 13 years ago by cneumuel

r6419, r6420

  • created events when statelog changed
  • fixed memory leak in hiveengine
  • extended timeout for long running transactions and database contexts (when jobdata is stored)
  • replaced random guids in database with sequential guids for performance reasons
  • minor fixes and cleanups
  • updated hive binaries
  • updated statistics

comment:107 Changed 13 years ago by abeham

r6422

  • synchronized config file with that from trunk

comment:108 Changed 13 years ago by abeham

  • Description modified (diff)

Added TODO point regarding the deletion of jobs in the experiment manager

comment:109 Changed 13 years ago by ascheibe

r6426 removed useLocalPlugins

comment:110 Changed 13 years ago by cneumuel

r6431 - applied some review comments

General:

  • changed Log to ThreadSafeLog
  • added license information to all files
  • added assembly descriptions
  • using blocks before namespace

HeuristicLab.Services.Hive.DataAccess:

  • made TransactionManager static
  • removed DaoException
  • removed TimeSpanExtensions
  • renamed prepareHiveDatabase.sql should be renamed to Prepare Hive Database.sql
  • created Initialize Hive Database.sql

comment:111 Changed 13 years ago by cneumuel

r6435

  • some cleanup in HiveEngine
  • using ThreadSafeLog instead of synchronized methods

comment:112 Changed 13 years ago by ascheibe

r6437 Admin UI:

  • some bugfixes
  • removed dummy stuff

comment:113 Changed 13 years ago by cneumuel

r6444

  • stability improvements for HiveExperiment and HiveEngine
  • parallelized upload of jobs
  • enabled cancellation of job upload
  • reduced the amount of double-assignment of jobs by an additional check in HeartbeatManager
  • tried to tackle the amount of deadlocks by automatically rerunning transactions
  • some fixes

comment:114 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:115 Changed 13 years ago by cneumuel

  • Description modified (diff)

comment:116 Changed 13 years ago by cneumuel

  • Description modified (diff)
Note: See TracTickets for help on using tickets.