hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhaoyunjiong (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
Date Sun, 30 Mar 2014 02:37:15 GMT

     [ https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhaoyunjiong reopened YARN-1888:
--------------------------------


The problem here is our cluster use port 0, but when restart NodeManager, the "Lost Nodes"
became inaccurate:
Host A have a NodeManager with ID: $HOSTA:$PORTA,
after restart, the NodeManager now with ID: $HOSTA:$PORTB,
since the ID changed, so ResourceManager didn't think it is a reconnected NodeManager.
Then few minutes later, NodeManager $HOSTA:$PORTA expired, and marked as LOST.
This make people confused, at first I don't think it is a bug too, but after few peoples asked
me why there are so many nodes LOST, then I come up with this simple patch: if there is already
another NodeManager in the same node (in real production cluster, I don't think people will
start more than one NodeManager on one machine), then don't mark expired NodeManager as LOST.





> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-1888
>                 URL: https://issues.apache.org/jira/browse/YARN-1888
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: zhaoyunjiong
>            Priority: Minor
>         Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message