aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody G <codyhg...@gmail.com>
Subject Re: Idea: rolling restarts in Aurora
Date Fri, 17 Mar 2017 21:56:42 GMT
I've drafted a small design document for this change:

https://docs.google.com/document/d/13xm23SfIRy5zMro82Ok8dRCsr7lKcC0_UUO_tJX21wQ/edit?usp=sharing

Any feedback would be greatly appreciated!

On Tue, Mar 7, 2017 at 11:15 AM, Cody G <codyhgibb@gmail.com> wrote:

> Created a ticket https://issues.apache.org/jira/browse/AURORA-1900 and
> assigned to myself.
>
> On Fri, Mar 3, 2017 at 11:29 AM, David McLaughlin <dmclaughlin@apache.org>
> wrote:
>
>> +1 for thinner client.
>>
>> Another reason rolling update was moved to the Scheduler was to have an
>> audit trail of changes to the job. If we could also get these restarts
>> appearing on the job page, it would be great.
>>
>> On Fri, Mar 3, 2017 at 11:15 AM, Zameer Manji <zmanji@apache.org> wrote:
>>
>> > +1
>> >
>> > If I recall correctly, the rolling update mechanism was added to Aurora
>> > because having the client coordinate batching was pretty tricky. I think
>> > the same applies here to a rolling restart.
>> >
>> > Considering the job controller technically supports this, adding a new
>> RPC
>> > to expose this behaviour would be beneficial.
>> >
>> > On Thu, Mar 2, 2017 at 7:40 PM, Cody G <codyhgibb@gmail.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I'd like to implement some new functionality in Aurora allowing for
>> > rolling
>> > > job restarts. There are many reasons why we might need to restart a
>> job,
>> > > e.g. freeing instances of a job from deadlock or refreshing some sort
>> of
>> > > external configuration.
>> > >
>> > > Currently, there are two options to execute a rolling restart, however
>> > both
>> > > are undesirable — either use the restartShards endpoint and implement
>> > > batching client-side, or use startJobUpdate with slightly modified
>> task
>> > > config so that a non-empty job diff forces an update. I propose
>> adding a
>> > > new thrift RPC for launching a rolling restart, which is an interface
>> > > around the existing upgrade logic. Instead of requiring a TaskConfig
>> and
>> > > instanceCount, this restart endpoint will only accept
>> JobUpdateSettings
>> > and
>> > > will simply launch an update with the currently used task
>> configuration.
>> > > All of the existing job update RPCs will still be able to access
>> updates
>> > > which were launched from this restart endpoint. This ensures restarts
>> are
>> > > available in the UI and no additional storage changes are required.
>> > >
>> > > If this proposal seems reasonable, I’ll file a ticket and draft up a
>> more
>> > > detailed RFC for further review.
>> > >
>> > > Cody
>> > >
>> > > --
>> > > Zameer Manji
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message