aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Morozov <igm...@gmail.com>
Subject Re: Finalization wait timeout in thermos executor for a task's teardown sequence
Date Thu, 19 May 2016 02:31:27 GMT
Yes, we are running 0.13.1.

Ok then, I'll file a task and will prepare a patch for review.

Thanks,
-Igor

On Wed, May 18, 2016 at 6:51 PM, Maxim Khutornenko <maxim@apache.org> wrote:
> I don't see much problem in making it configurable at the executor level.
>
> Just to make sure though, are you running your executors with this fix:
> https://issues.apache.org/jira/browse/AURORA-1642?
>
> We had a similar problem where any kill took exactly 1 minute to complete,
> hence the above fix.
>
> On Wed, May 18, 2016 at 5:46 PM, Igor Morozov <igmorv@gmail.com> wrote:
>
>> Folks,
>>
>> We need to support a use case here at Uber when service processes that
>> don't respect SIGTERM signal and get killed after a default hardcoded
>> preemption timeout of 1 minute during task kill or task restart. That
>> significantly slows down upgrade workflow for such services.
>> We'd like to control this timeout, essentially reducing it to 5-10 seconds.
>>
>> My current thinking is to expose preemption_wait timeout
>>
>> class ThermosTaskRunner(TaskRunner):
>> ....
>> THERMOS_PREEMPTION_WAIT = Amount(1, Time.MINUTES)
>>
>> in thermos executor flags and set it in
>> DefaultThermosTaskRunnerProvider eventually propagating to all
>> ThermosRunner tasks.
>>
>> A proper fix would be probably something in the line of making this
>> timeout configurable per task config but that would involve changing
>> pystachio thermos schema.
>>
>> Thoughts?
>>
>> -Igor Morozov
>>



-- 
-Igor

Mime
View raw message