hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1888) Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
Date Tue, 01 Apr 2014 16:03:19 GMT

    [ https://issues.apache.org/jira/browse/YARN-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956690#comment-13956690
] 

Jason Lowe commented on YARN-1888:
----------------------------------

I agree with [~kasha] on this.  A nodemanager coming up on a different port isn't necessarily
the same nodemanager from a previous instance.  For exampe, the minicluster runs multiple
nodes on the same host with different ports, so if one of these nodes disappears then it will
no longer be reported as lost with this patch since there are others still running with the
same host?

I think the real fix is to run the nodemanager with a non-ephemeral nodemanager port specified
in yarn-site.xml.  This helps solve a number of issues:

# lost nodes count will be accurate
# a NM that reboots and rejoins the cluster before the RM expires the old instance will be
correctly recognized as the same NM, and we avoid the RM thinking there are really two NMs
on the host for up to the NM expiry interval
# attempts to start a subsequent NM on the same host where an NM is already running will fail
rather than accidentally overcommit the node

> Not add NodeManager to inactiveRMNodes when reboot NodeManager which have different port
> ----------------------------------------------------------------------------------------
>
>                 Key: YARN-1888
>                 URL: https://issues.apache.org/jira/browse/YARN-1888
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: zhaoyunjiong
>            Priority: Minor
>         Attachments: YARN-1888.patch
>
>
> When NodeManager's port set to 0, reboot NodeManager will cause "Losts Nodes" inaccurate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message