hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno Alexandre Rosa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2299) inconsistency at identifying node
Date Wed, 19 Nov 2014 17:40:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218207#comment-14218207

Bruno Alexandre Rosa commented on YARN-2299:


> inconsistency at identifying node
> ---------------------------------
>                 Key: YARN-2299
>                 URL: https://issues.apache.org/jira/browse/YARN-2299
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Hong Zhiguo
>            Assignee: Hong Zhiguo
>            Priority: Critical
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose random port.
If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) and then restarted within
"yarn.nm.liveness-monitor.expiry-interval-ms", "host:port1" and "host:port2" will both be
present in "Active Nodes" on WebUI for a while, and after host:port1 expiration, we get host:port1
in "Lost Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead again, we
get only host:port1 in "Lost Nodes". "host:port2" is neither in "Active Nodes" nor in  "Lost
> Another case, two NM is running on same host(miniYarnCluster or other test purpose),
if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains port.
Two nodes with same host but different port are thought to be different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only use host.
Two nodes with same host but different port are thought to identical.
> To fix the inconsistency, we should differentiate below 2 cases and be consistent for
both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), and use host
to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config.  In this
sutiation, NM instances one after another on same host will have same NodeId, while intentionally
multiple NMs per host will have different NodeId.
> Personally I prefer option 1 because it's easier for users.

This message was sent by Atlassian JIRA

View raw message