hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chang Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels
Date Thu, 07 Jan 2016 19:28:40 GMT

     [ https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chang Li updated YARN-4414:
    Attachment: YARN-4414.2.patch

Thanks [~jlowe] for review!
updated .2 patch to remove getNMProxy2 and implemented getProxy() in term of getProxy(Configuration).
I set NM address to some dummy value 1234 so that it will trigger connection error and rpc
level retires.
{{BaseContainerManagerTest}} set it to {code}"" + ServerSocketUtil.getPort(49162,
10); {code} a normal address thus rpc retry could not be triggered

> Nodemanager connection errors are retried at multiple levels
> ------------------------------------------------------------
>                 Key: YARN-4414
>                 URL: https://issues.apache.org/jira/browse/YARN-4414
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1, 2.6.2
>            Reporter: Jason Lowe
>            Assignee: Chang Li
>         Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, YARN-4414.1.3.patch, YARN-4414.1.patch,
> This is related to YARN-3238.  Ran into more scenarios where connection errors are being
retried at multiple levels, like NoRouteToHostException.  The fix for YARN-3238 was too specific,
and I think we need a more general solution to catch a wider array of connection errors that
can occur to avoid retrying them both at the RPC layer and at the NM proxy layer.

This message was sent by Atlassian JIRA

View raw message