aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Finalization wait timeout in thermos executor for a task's teardown sequence
Date Thu, 19 May 2016 01:51:00 GMT
I don't see much problem in making it configurable at the executor level.

Just to make sure though, are you running your executors with this fix:
https://issues.apache.org/jira/browse/AURORA-1642?

We had a similar problem where any kill took exactly 1 minute to complete,
hence the above fix.

On Wed, May 18, 2016 at 5:46 PM, Igor Morozov <igmorv@gmail.com> wrote:

> Folks,
>
> We need to support a use case here at Uber when service processes that
> don't respect SIGTERM signal and get killed after a default hardcoded
> preemption timeout of 1 minute during task kill or task restart. That
> significantly slows down upgrade workflow for such services.
> We'd like to control this timeout, essentially reducing it to 5-10 seconds.
>
> My current thinking is to expose preemption_wait timeout
>
> class ThermosTaskRunner(TaskRunner):
> ....
> THERMOS_PREEMPTION_WAIT = Amount(1, Time.MINUTES)
>
> in thermos executor flags and set it in
> DefaultThermosTaskRunnerProvider eventually propagating to all
> ThermosRunner tasks.
>
> A proper fix would be probably something in the line of making this
> timeout configurable per task config but that would involve changing
> pystachio thermos schema.
>
> Thoughts?
>
> -Igor Morozov
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message