hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neelesh Srinivas Salian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry
Date Sun, 04 Oct 2015 01:43:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942528#comment-14942528

Neelesh Srinivas Salian commented on YARN-4185:

1) Using the exponentialBackoffRetry policy will have a progression of wait time starting
at 1sec per retry assuming it takes a second for the NM to come up.
Hence exponentially, the backoff time increases 2,4,8,16...till 512 as we approach 10 retries.

2) In the current strategy, the wait time is 10 seconds which causes an NM that restarted
in 1 second to wait for a retry.

3) In the event of the retries going forward, at the 3rd retry ( the wait time is collectively
7 seconds (1+2+4) as per the exponential strategy) and (30 (10+10+10) seconds as the current
static retry)

4) If you keep retrying, collectively the waiting static retry has now waited for 60 seconds
versus 2^6 = 64 seconds in the exponential strategy at the 6th retry attempt.

Logic for the Design:
1) In the event of retries being default to 10, 
   a. I propose after the 3rd attempt, we continue to keep the wait time as 4 seconds and
continue the same. 
   Thus the total time comes up to 1,2,4,4,4,4,4,4,4,4 = 35 seconds.
   b. Versus collectively spending 100 seconds on waiting time in the static retry strategy.

2) Alternatively, the logic could be:
   a. Have the 1st 3 attempts of retry. If further needed, fall back to the 1sec start of
the same logic.
      So, it looks like this.. (1,2,4)  (1,2,4)  (1,2,4) (1) for 10 retries.
   b. Thus we get the 10 retries done in collectively 22 seconds versus 100 seconds.

Requesting feedback.
Thank you.

> Retry interval delay for NM client can be improved from the fixed static retry 
> -------------------------------------------------------------------------------
>                 Key: YARN-4185
>                 URL: https://issues.apache.org/jira/browse/YARN-4185
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Anubhav Dhoot
>            Assignee: Neelesh Srinivas Salian
> Instead of having a fixed retry interval that starts off very high and stays there, we
are better off using an exponential backoff that has the same fixed max limit. Today the retry
interval is fixed at 10 sec that can be unnecessarily high especially when NMs could rolling
restart within a sec.

This message was sent by Atlassian JIRA

View raw message