aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Brazil <brian.bra...@boxever.com>
Subject Re: Graceful task shutdown
Date Wed, 08 Apr 2015 14:11:15 GMT
On 7 April 2015 at 22:15, Brian Brazil <brian.brazil@boxever.com> wrote:

> On 7 April 2015 at 21:28, Erb, Stephan <Stephan.Erb@blue-yonder.com>
> wrote:
>
>> Brian, do you have any particular plans regarding your shutdown
>> requirements? I have seen that you have filed another issue [1] which is
>> also concerned with graceful shutdown.
>>
>
> Given this thread, I now only wish to hit a different endpoint than
> /quitquitquit (and I may aswell do /abortabortabort while I'm at it). The
> rest is changes to our internal shutdown handling.
>
>
>> Stephan
>>
>> PS: For what it's worth, I implemented the 'quick fix' version to my
>> problem stated in the beginning of this thread [2].
>>
>
> That's handy. When writing the code up today I noticed that hitting
> /quitquitquit wasn't unittested. I hope to have that up for review tomorrow
> with unittests, which you could build on to do a more end-to-end unittest
> for your code.
>

This is now up at https://reviews.apache.org/r/32973/

Brian


>
> Brian
>
>
>> [1] https://issues.apache.org/jira/browse/AURORA-1257
>> [2] https://reviews.apache.org/r/32889/
>>
>> ________________________________________
>> From: Brian Brazil <brian.brazil@boxever.com>
>> Sent: Tuesday, March 24, 2015 10:48 PM
>> To: dev@aurora.incubator.apache.org
>> Subject: Re: Graceful task shutdown
>>
>> On 24 March 2015 at 21:33, George Sirois <george@tellapart.com> wrote:
>>
>> > Unfortunately I don't think my change will be able to make it in as-is.
>> >
>> > As Brian Wickman pointed out, it could introduce serious problems
>> because
>> > there are varying timeouts across the scheduler/executor, so if you set
>> > your wait time to be too high, the scheduler might start to consider the
>> > tasks lost because they stayed in the transient KILLING state for too
>> long.
>> >
>>
>> Hmm, what sort of work is involved in resolving that?
>>
>> In my case I need at least 12s after the /qqq before sending the TERM.
>>
>> Brian
>>
>>
>> >
>> > I do think the lifecycle modules idea would solve Stephan's issue.
>> >
>> > On Tue, Mar 24, 2015 at 5:06 PM, Brian Brazil <brian.brazil@boxever.com
>> >
>> > wrote:
>> >
>> > > On 24 March 2015 at 20:57, Erb, Stephan <Stephan.Erb@blue-yonder.com>
>> > > wrote:
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > we are implementing the /health endpoint in our services but omit
>> the
>> > > > implementation of the unauthenticated lifecycle methods
>> /quitquitquit
>> > and
>> > > > /abortabortabort.
>> > > >
>> > > > As a consequence, stopping a service is taxed by 10 seconds waiting
>> > time
>> > > > [1]. I would like to get rid of this unnecessary delay and can
>> think of
>> > > two
>> > > > solutions:
>> > > >
>> > > > a) Only perform the escalation wait when the http_signaler reports
>> that
>> > > > the message could be delivered to the service. This is a rather
>> simple
>> > > and
>> > > > localized fix.
>> > > >
>> > > > b) Use another port for lifecycle events. This would require a new
>> > > > addition to the task configuration and proper plumbing throughout
>> the
>> > > rest
>> > > > of the system. Backward compatibility could be achieved by using
>> > 'health'
>> > > > as the default lifecycle management port.
>> > > >
>> > > > Any thoughts? I would be happy with the simple solution, but in the
>> end
>> > > > it's your call :-)
>> > > >
>> > >
>> > > __george mentioned on IRC working on a change that'll let the wait
>> time
>> > be
>> > > configurable (which is something I also need), would that cover your
>> use
>> > > case?
>> > >
>> > > There were also discussions on IRC about custom lifecycle modules.
>> > >
>> > > Brian
>> > >
>> > >
>> > > >
>> > > > Best Regards,
>> > > > Stephan
>> > > >
>> > > > [1]
>> > > >
>> > >
>> >
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/executor/thermos_task_runner.py#L123
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message