Context Navigation

Changes between Version 118 and Version 127 of Ticket #1233

Timestamp:: 06/28/11 10:03:51 (13 years ago)
Author:: cneumuel
Comment:

Legend:

: Unmodified
: Added
: Removed
: Modified

Ticket #1233 – Description

-                      v118
+                      v127
  * ~~Make WCF service completely stateless. Put all remaining state-information into the database (latestHeartbeats, latestConsistencyCheck, newlyAssignedJobs (remove completely and solve by adding a heartbeat))~~
  * ~~`StateLog`: Log state transitions of jobs.~~
  * Statistics
+ * ~~Statistics~~
   * ~~Measure core capacity and utilization every minute~~
   * ~~Measure CPU and memory capacity and utilization every minute~~
 …
  * ~~Show jobs in treeview. Would greatly save screen space and navigation-clicks~~
   * ~~to be enhanced (event wiring)~~
  * Sort `HiveExperiments` alphabetically
+ * ~~Sort `HiveExperiments` alphabetically~~
  * ~~Plugin-Upload (optional)~~
  * Experiment Sharing
+ * ~~Experiment Sharing~~
  * ~~Appropriate numbering of Runs~~
  * ~~Use Service-Call pattern from OKB (or PPOV-Cockpit)~~
 …
  * ~~`HiveEngine` jobs should have a `HiveExperiment`, which is marked, so a user cannot see it in `HiveExperimentManager`. However it should be visible in Administration GUI. If a Hive Engine crashes and cannot delete the experiment, this should be detected by the server and it should be automatically deleted.~~
  * ~~Improve `HiveEngine` View (list of jobs, with status ect.)~~
  * Stabilize
+ * ~~Stabilize~~
 === Administration ===
 …
  * ~~build Observable Collections for `Users/Slaves/Groups`~~
  * ~~add `ContentViews` for Users and `SlaveGroups`~~
  * show some fancy statistics
+ * ~~show some fancy statistics~~
  * ~~add Save Button~~
  * ~~integrate `HeuristicLab.Services.Hive.Common-3.4` in Server~~
 …
    * only `Full` and `Read` permissions are necessary (`Read`: just read!, `Full`: control, delete, grant permissions)
  * remove `LastAccessed` and `IsHiveEngine`. there should be a category field instead.
+=== Remarks for the future (cneumuel) ^(28.06.2011)^ ===
+'''Security'''
+ * `GetPlugins` currently returns all plugins from the server. This exposes all uploaded assemblies. When confidentiality for plugins is relevant this method should be removed and only `GetPlugin(s)ById` and `GetPlugin(s)ByHash` should be available.
+ * Slave-user: Each hive slave uses the same username and password. A slave is allowed to download jobs. When a slave downloads a job it should be checked if the job is assigned to this slave or a parent-slave-group (not implemented yet). However it is still possible for an attacker to fake the ID of another slave (if it is known) and get access to jobs.
+'''Statistics'''
+ * Further measures to include (as total sums, also keep deleted jobs in `DeletedJobStatistics`):
+   * Globally: !FinishedJobs, !WaitingJobs, !FailedJobs, !AbortedJobs, !TransferringJobs, !PausedJobs
+   * Per user: total jobdata-size (MB)
+'''Server performance'''
+ * Increasing number of slaves puts pressure on the server with increasing response times and some deadlock-situations. Ideas to resolve:
+   * Increase heartbeat-interval (maybe dynamic when the number of slaves gets higher). Remember to increase the `SlaveHeartbeatTimeout` in the web.config too.
+   * Make `GetWaitingJobs` faster by using stored procedure or use a job-queue instead of querying the whole job-table.
+ * Large jobs (>15MB) are sometimes result in database-timeouts, especially if multiple of them are uploaded concurrently. Ideas to resolve:
+   * Use `Filestream` as db-type instead of `Varbinary` as it is supposed to be faster for large data-blobs.
+   * As streaming is not an option (no security, encryption), using a `chunking channel` could work (http://msdn.microsoft.com/en-us/library/aa717050.aspx).
+'''Scheduling'''\\
+Some ideas for a scheduler:
+ * 3 levels of priorities:
+   * Job priority (fixed at upload)
+   * User priority (fixed)
+   * Time (dynamic: `f(Now-Uploaded)`)
+ * Those 3 priority values are aggregated (average, (weighted-)sum) represent the final priority by which the jobs are ordered.
+ * Fast-slaves-first: Faster slaves get the jobs first, slow slaves later. This would require:
+   * Performance-index: Let each slave calculate a benchmark-job before it is used.
+   * Job-queues per slaves: Right now every slave who sends a heartbeat gets a job (if one is available). One queue per slave would allow the server to actively assign jobs to slaves. Such a queue could also ease performance issues and race conditions.
+ * Re-scheduling: Sometimes fast slaves finish their jobs and slow slaves are still calculating. In those cases it might be reasonable to pause the jobs and have them calculated on the faster slaves.