[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331958#comment-14331958
]
Xuan Gong commented on YARN-3238:
---------------------------------
Committed into trunk/branch-2. Thanks, Jason !
> Connection timeouts to nodemanagers are retried at multiple levels
> ------------------------------------------------------------------
>
> Key: YARN-3238
> URL: https://issues.apache.org/jira/browse/YARN-3238
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3238.001.patch
>
>
> The IPC layer will retry connection timeouts automatically (see Client.java), but we
are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created.
This causes a two-level retry mechanism where the IPC layer has already retried quite a few
times (45 by default) for each YARN RetryPolicy error that is retried. The end result is
that NM clients can wait a very, very long time for the connection to finally fail.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|