hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3070) NM not able to register with RM after NM restart
Date Thu, 20 Oct 2011 22:58:12 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated MAPREDUCE-3070:
-------------------------------------

    Status: Open  (was: Patch Available)

I think this can be simplified. We don't need 'reconnected' state.

Essentially an NM should be identified with host+port (see NodeId.hashCode).

Now on registration we can assume that host+port is unique - now the question is: why isn't
this already working?

But I agree with Kamesh's observation on MAPREDUCE-3178, we need to fix that as he pointed
out. 

But this should already work if the NM comes up on a different port?
                
> NM not able to register with RM after NM restart
> ------------------------------------------------
>
>                 Key: MAPREDUCE-3070
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3070
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.0
>            Reporter: Ravi Teja Ch N V
>            Assignee: Devaraj K
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3070.patch
>
>
> After stopping NM gracefully then starting NM, NM registration fails with RM with Duplicate
registration from the node! error.
> {noformat} 
> 2011-09-23 01:50:46,705 FATAL nodemanager.NodeManager (NodeManager.java:main(204)) -
Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager
> 	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:153)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:202)
> Caused by: org.apache.avro.AvroRuntimeException: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
Duplicate registration from the node!
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
> 	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
> 	... 2 more
> Caused by: org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Duplicate
registration from the node!
> 	at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:142)
> 	at $Proxy13.registerNodeManager(Unknown Source)
> 	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:175)
> 	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:137)
> 	... 3 more
> {noformat} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message