Free cookie consent management tool by TermsFeed Policy Generator

Changes between Version 118 and Version 127 of Ticket #1233


Ignore:
Timestamp:
06/28/11 10:03:51 (13 years ago)
Author:
cneumuel
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #1233 – Description

    v118 v127  
    88 * ~~Make WCF service completely stateless. Put all remaining state-information into the database (latestHeartbeats, latestConsistencyCheck, newlyAssignedJobs (remove completely and solve by adding a heartbeat))~~
    99 * ~~`StateLog`: Log state transitions of jobs.~~
    10  * Statistics
     10 * ~~Statistics~~
    1111  * ~~Measure core capacity and utilization every minute~~
    1212  * ~~Measure CPU and memory capacity and utilization every minute~~
     
    4747 * ~~Show jobs in treeview. Would greatly save screen space and navigation-clicks~~
    4848  * ~~to be enhanced (event wiring)~~
    49  * Sort `HiveExperiments` alphabetically
     49 * ~~Sort `HiveExperiments` alphabetically~~
    5050 * ~~Plugin-Upload (optional)~~
    51  * Experiment Sharing
     51 * ~~Experiment Sharing~~
    5252 * ~~Appropriate numbering of Runs~~
    5353 * ~~Use Service-Call pattern from OKB (or PPOV-Cockpit)~~
     
    6060 * ~~`HiveEngine` jobs should have a `HiveExperiment`, which is marked, so a user cannot see it in `HiveExperimentManager`. However it should be visible in Administration GUI. If a Hive Engine crashes and cannot delete the experiment, this should be detected by the server and it should be automatically deleted.~~
    6161 * ~~Improve `HiveEngine` View (list of jobs, with status ect.)~~
    62  * Stabilize
     62 * ~~Stabilize~~
    6363
    6464=== Administration ===
     
    7777 * ~~build Observable Collections for `Users/Slaves/Groups`~~
    7878 * ~~add `ContentViews` for Users and `SlaveGroups`~~
    79  * show some fancy statistics
     79 * ~~show some fancy statistics~~
    8080 * ~~add Save Button~~
    8181 * ~~integrate `HeuristicLab.Services.Hive.Common-3.4` in Server~~
     
    103103   * only `Full` and `Read` permissions are necessary (`Read`: just read!, `Full`: control, delete, grant permissions)
    104104 * remove `LastAccessed` and `IsHiveEngine`. there should be a category field instead.
     105
     106=== Remarks for the future (cneumuel) ^(28.06.2011)^ ===
     107'''Security'''
     108 * `GetPlugins` currently returns all plugins from the server. This exposes all uploaded assemblies. When confidentiality for plugins is relevant this method should be removed and only `GetPlugin(s)ById` and `GetPlugin(s)ByHash` should be available.
     109 * Slave-user: Each hive slave uses the same username and password. A slave is allowed to download jobs. When a slave downloads a job it should be checked if the job is assigned to this slave or a parent-slave-group (not implemented yet). However it is still possible for an attacker to fake the ID of another slave (if it is known) and get access to jobs.
     110
     111'''Statistics'''
     112 * Further measures to include (as total sums, also keep deleted jobs in `DeletedJobStatistics`):
     113   * Globally: !FinishedJobs, !WaitingJobs, !FailedJobs, !AbortedJobs, !TransferringJobs, !PausedJobs
     114   * Per user: total jobdata-size (MB)
     115
     116'''Server performance'''
     117 * Increasing number of slaves puts pressure on the server with increasing response times and some deadlock-situations. Ideas to resolve:
     118   * Increase heartbeat-interval (maybe dynamic when the number of slaves gets higher). Remember to increase the `SlaveHeartbeatTimeout` in the web.config too.
     119   * Make `GetWaitingJobs` faster by using stored procedure or use a job-queue instead of querying the whole job-table.
     120 * Large jobs (>15MB) are sometimes result in database-timeouts, especially if multiple of them are uploaded concurrently. Ideas to resolve:
     121   * Use `Filestream` as db-type instead of `Varbinary` as it is supposed to be faster for large data-blobs.
     122   * As streaming is not an option (no security, encryption), using a `chunking channel` could work (http://msdn.microsoft.com/en-us/library/aa717050.aspx).
     123
     124
     125'''Scheduling'''\\
     126Some ideas for a scheduler:
     127 * 3 levels of priorities:
     128   * Job priority (fixed at upload)
     129   * User priority (fixed)
     130   * Time (dynamic: `f(Now-Uploaded)`)
     131 * Those 3 priority values are aggregated (average, (weighted-)sum) represent the final priority by which the jobs are ordered.
     132 * Fast-slaves-first: Faster slaves get the jobs first, slow slaves later. This would require:
     133   * Performance-index: Let each slave calculate a benchmark-job before it is used.
     134   * Job-queues per slaves: Right now every slave who sends a heartbeat gets a job (if one is available). One queue per slave would allow the server to actively assign jobs to slaves. Such a queue could also ease performance issues and race conditions.
     135 * Re-scheduling: Sometimes fast slaves finish their jobs and slow slaves are still calculating. In those cases it might be reasonable to pause the jobs and have them calculated on the faster slaves.