mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anindya Sinha (JIRA)" <>
Subject [jira] [Commented] (MESOS-7087) Consider improving exponential backoff algorithm.
Date Thu, 09 Feb 2017 18:13:41 GMT


Anindya Sinha commented on MESOS-7087:

Here is a write up on a proposal to address this situation:

Comments/feedback welcome.

> Consider improving exponential backoff algorithm.
> -------------------------------------------------
>                 Key: MESOS-7087
>                 URL:
>             Project: Mesos
>          Issue Type: Improvement
>          Components: general
>            Reporter: Anindya Sinha
>            Assignee: Anindya Sinha
> There are 3 types of backoff algorithms in use:
> 1) Exponential backoff with randomness, as in framework/agent registration.
> 2) Exponential backoff with no randomness, as in status updates.
> 3) Linear backoff with randomness, as in executor registration.
> Consider framework registration. nth retry attempt is done after a random interval ranging
between [0 .. backoff * 2^(n-1)] as long as each interval is less than 1 min. The default
value for backoff is 2secs.
> Although the current approach brings in exponential backoff with randomness, we have
observed that for clusters with thousands of agents and/or frameworks, the actual retry interval
(which is randomized) can end up being very frequent for a substantial number of agents and/or
frameworks due to the fact that the allowed range is [0 .. <n>], which leads to bombarding
the master with tons of messages thereby overloading it.
> So, the main issues seen are (esp for large number of frameworks and/or agents) are:
> 1) Every subsequent retry should be spaced off by a minimum deterministic amount from
the previous attempt.
> 2) Every subsequent retry should be greater or equal to the previous attempt.
> 3) Maximum retry interval should be configurable since it can be a function of the initial
backoff factor.

This message was sent by Atlassian JIRA

View raw message