hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neelesh Srinivas Salian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry
Date Tue, 06 Oct 2015 01:42:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944384#comment-14944384
] 

Neelesh Srinivas Salian commented on YARN-4185:
-----------------------------------------------

[~adhoot], thanks for the clarification.
So, the initial retries can be done with backoff times of 1,2,4,8 that is still less then
10 and thus give the opportunity to retry for a short-lived NM restart (under 10 seconds)
We can continue to wait 10 seconds of backoff incrementally to accomodate a larger failure
time.

Thus, the failure times can be under 1,2,4,8,10,10 and so on till the number of retries is
exhausted.
My only concern is that if the failure lasts longer than the total wait time and the number
of retries, there won't be a chance to retry.

I'll write up a patch to exhibit this.
Thank you.

> Retry interval delay for NM client can be improved from the fixed static retry 
> -------------------------------------------------------------------------------
>
>                 Key: YARN-4185
>                 URL: https://issues.apache.org/jira/browse/YARN-4185
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Anubhav Dhoot
>            Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays there, we
are better off using an exponential backoff that has the same fixed max limit. Today the retry
interval is fixed at 10 sec that can be unnecessarily high especially when NMs could rolling
restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message