aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Brazil <brian.bra...@boxever.com>
Subject Re: Graceful task shutdown
Date Tue, 24 Mar 2015 21:48:21 GMT
On 24 March 2015 at 21:33, George Sirois <george@tellapart.com> wrote:

> Unfortunately I don't think my change will be able to make it in as-is.
>
> As Brian Wickman pointed out, it could introduce serious problems because
> there are varying timeouts across the scheduler/executor, so if you set
> your wait time to be too high, the scheduler might start to consider the
> tasks lost because they stayed in the transient KILLING state for too long.
>

Hmm, what sort of work is involved in resolving that?

In my case I need at least 12s after the /qqq before sending the TERM.

Brian


>
> I do think the lifecycle modules idea would solve Stephan's issue.
>
> On Tue, Mar 24, 2015 at 5:06 PM, Brian Brazil <brian.brazil@boxever.com>
> wrote:
>
> > On 24 March 2015 at 20:57, Erb, Stephan <Stephan.Erb@blue-yonder.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > we are implementing the /health endpoint in our services but omit the
> > > implementation of the unauthenticated lifecycle methods /quitquitquit
> and
> > > /abortabortabort.
> > >
> > > As a consequence, stopping a service is taxed by 10 seconds waiting
> time
> > > [1]. I would like to get rid of this unnecessary delay and can think of
> > two
> > > solutions:
> > >
> > > a) Only perform the escalation wait when the http_signaler reports that
> > > the message could be delivered to the service. This is a rather simple
> > and
> > > localized fix.
> > >
> > > b) Use another port for lifecycle events. This would require a new
> > > addition to the task configuration and proper plumbing throughout the
> > rest
> > > of the system. Backward compatibility could be achieved by using
> 'health'
> > > as the default lifecycle management port.
> > >
> > > Any thoughts? I would be happy with the simple solution, but in the end
> > > it's your call :-)
> > >
> >
> > __george mentioned on IRC working on a change that'll let the wait time
> be
> > configurable (which is something I also need), would that cover your use
> > case?
> >
> > There were also discussions on IRC about custom lifecycle modules.
> >
> > Brian
> >
> >
> > >
> > > Best Regards,
> > > Stephan
> > >
> > > [1]
> > >
> >
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/executor/thermos_task_runner.py#L123
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message