aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-690) Add support for external update coordination
Date Mon, 20 Oct 2014 23:10:35 GMT

     [ https://issues.apache.org/jira/browse/AURORA-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bill Farner updated AURORA-690:
-------------------------------
    Assignee:     (was: Maxim Khutornenko)

> Add support for external update coordination
> --------------------------------------------
>
>                 Key: AURORA-690
>                 URL: https://issues.apache.org/jira/browse/AURORA-690
>             Project: Aurora
>          Issue Type: Story
>          Components: Client, Scheduler
>            Reporter: Maxim Khutornenko
>            Priority: Critical
>
> With the introduction of scheduler-driven job update orchestration (AURORA-610) it will
be a bit harder for a user to interrupt a job update process went wrong (i.e. bad binary,
incorrect settings, changed external conditions and etc.). Instead of aborting the update
process via CTRL-C (client updater) users would have to run abort/pause command that risk
to never reach scheduler in case of client network partitioning. 
> To compensate the above, it would be great for the scheduler to optionally support an
inverted dependency model where the updater would willingly pause job update progress upon
reaching certain checkpoints and wait for the client/external service to explicitly "ack"
on it (i.e. resumeJobUpdate RPC). Such checkpoints could be:
> - predefined number of instances reached
> - percentage of completion
> - time-based heartbeat (HB) intervals
> Arguably, the time-based HB approach should be the most versatile addressing the majority
case.
> Generalizing further, this feature would be useful for building external update coordination
services where Aurora service job upgrades are controlled by application specific health tracking
systems throttling individual job updates based on the internal health/traffic metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message