hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3070) NM not able to register with RM after NM restart
Date Fri, 21 Oct 2011 06:42:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132369#comment-13132369
] 

Devaraj K commented on MAPREDUCE-3070:
--------------------------------------

Thanks Arun and Kamesh for taking look into the patch.

bq. I think this can be simplified. We don't need 'reconnected' state.
It can be simplified, will update the patch with simplified approach.

{quote}
 Essentially an NM should be identified with host+port (see NodeId.hashCode).

 Now on registration we can assume that host+port is unique - now the question is: why isn't
this already working?
{quote}
{code:title=ResourceTrackerService.java|borderStyle=solid}
      if (this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode) != null) {
        throw new IOException("Duplicate registration from the node!");
      }
{code}

If the node manager goes down, it will be removed from the this.rmContext.getRMNodes() after
completion of the expiry interval(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS). Before completion
of expiry interval if the same node manager comes up in the same port, RM throws IO exception
saying "Duplicate registration from the node!" and NM fails to start with the same reason.


bq. But I agree with Kamesh's observation on MAPREDUCE-3178, we need to fix that as he pointed
out. 

It can be handled, will handle in the next patch.

bq. But this should already work if the NM comes up on a different port?

Yes, It works fine.
                
> NM not able to register with RM after NM restart
> ------------------------------------------------
>
>                 Key: MAPREDUCE-3070
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3070
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3070.patch
>
>
> After stopping NM gracefully then starting NM, NM registration fails with RM with Duplicate
registration from the node! error.
> {noformat} 
> 2011-09-23 01:50:46,705 FATAL nodemanager.NodeManager (NodeManager.java:main(204)) -
Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager
> 	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:153)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:202)
> Caused by: org.apache.avro.AvroRuntimeException: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
Duplicate registration from the node!
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
> 	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
> 	... 2 more
> Caused by: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Duplicate
registration from the node!
> 	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
> 	at $Proxy13.registerNodeManager(Unknown Source)
> 	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:175)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
> 	... 3 more
> {noformat} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message