aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <>
Subject Re: reasonable preemption delay to use
Date Tue, 17 Feb 2015 17:36:31 GMT
The watch_secs is triggered when a task enters RUNNING. In order for
the rolling update to not fail early the restart_threshold [1] needs
to be bumped up to account for the preemption delay.

As for the default preemption delay, it was implemented to avoid
unnecessary churn in the cluster. Larger/constraint-diverse tasks take
longer to bin-place, as such there could be occasional scheduling
delays when resources are tight. Hence, the grace buffer. You can
definitely dial it in given the specifics of your cluster.


[1] -

On Tue, Feb 17, 2015 at 12:51 AM, Erb, Stephan
<> wrote:
> If I remember correctly, you also have to make sure that your UpdateConfig watch_secs
is larger than your preemption_delay. Otherwise a rolling update of a production job might
not be able to get the resources it needs.
> Best Regards,
> Stephan
> ________________________________________
> From: Bhuvan Arumugam <>
> Sent: Monday, February 16, 2015 7:14 AM
> To:
> Subject: reasonable preemption delay to use
> Hello,
> Recently, in one of our clusters we noticed production jobs go to
> PENDING state, due to insufficient CPU. The non production jobs are
> not preempted, as we haven't used --preemption_delay flag for
> scheduler. The default value for this flag is 10mins. Why is it too
> high? Is there any reasoning behind using 10mins as a default value?
> We are thinking to to use 2mins for this flag. We wouldn't want to
> wait beyond 2mins to run a prod job during resource constraint. Does
> it sound reasonable? What's the typical preemption delay used by SREs?
> --
> Regards,
> Bhuvan Arumugam

View raw message