mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anindya Sinha <anindya_si...@apple.com>
Subject Exponential Backoff
Date Mon, 13 Feb 2017 05:03:39 GMT
Reference: https://issues.apache.org/jira/browse/MESOS-7087 <https://issues.apache.org/jira/browse/MESOS-7087>

Currently, we have at least 3 types of backoff such as:
1) Exponential backoff with randomness, as in framework/agent registration.
2) Exponential backoff with no randomness, as in status updates.
3) Linear backoff with randomness, as in executor registration.

In framework registration as an example, each retry ranges between [0 .. b*2^(n-1)] for nth
retry attempt as long as each interval is less than 1 min.

For clusters with large number of frameworks and/or agents, the randomness may not be enough
since the timeout can end up being very small for a substantial number of clients (agents
and/or frameworks) due to the fact that the allowed range is [0 .. <n>] for all retry
attempts.

The following doc looks at an enhancement to the existing proposal to ensure that the timeout
values are not extremely small, and that every subsequent retry should have a timeout value
atleast as much as the previous iteration.

https://docs.google.com/document/d/1nUxvh6BbB8jv5G-MvckGj9XzFYLBrUM0O5Go_Zmdftk/edit?usp=sharing
<https://docs.google.com/document/d/1nUxvh6BbB8jv5G-MvckGj9XzFYLBrUM0O5Go_Zmdftk/edit?usp=sharing>

Feedback welcome.

Thanks
Anindya


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message